[Issue]: SVD scripts are not working in directml

rodrigoandrigo commented 4 months ago

Issue Description

I tried using the 3 video generator scripts with directml, but none of them worked

Text-to-Video in Text models: Potat v1, ZeroScope v2 Dark, ModelScope 1.7b

Image-to-Video in Image models: VGen

Stable Video Diffusion in Image models: SVD XT 1.1

Version Platform Description

2024-07-16 12:38:07,164 | sd | INFO | launch | Starting SD.Next 2024-07-16 12:38:07,169 | sd | INFO | installer | Logger: file="C:\StabilityMatrix\Data\Packages\SD.Next\sdnext.log" level=INFO size=899852 mode=append 2024-07-16 12:38:07,171 | sd | INFO | installer | Python version=3.10.11 platform=Windows bin="C:\StabilityMatrix\Data\Packages\SD.Next\venv\Scripts\python.exe" venv="C:\StabilityMatrix\Data\Packages\SD.Next\venv" 2024-07-16 12:38:07,474 | sd | INFO | installer | Version: app=sd.next updated=2024-07-10 hash=2ec6e9ee branch=master url=https://github.com/vladmandic/automatic/tree/master ui=main 2024-07-16 12:38:08,050 | sd | INFO | launch | Platform: arch=AMD64 cpu=AMD64 Family 25 Model 80 Stepping 0, AuthenticAMD system=Windows release=Windows-10-10.0.22631-SP0 python=3.10.11 2024-07-16 12:38:08,053 | sd | DEBUG | installer | Torch allocator: "garbage_collection_threshold:0.80,max_split_size_mb:512" 2024-07-16 12:38:08,054 | sd | DEBUG | installer | Torch overrides: cuda=False rocm=False ipex=False diml=True openvino=False 2024-07-16 12:38:08,054 | sd | DEBUG | installer | Torch allowed: cuda=False rocm=False ipex=False diml=True openvino=False 2024-07-16 12:38:08,054 | sd | INFO | installer | Using DirectML Backend 2024-07-16 09:35:37,397 | sd | DEBUG | launch | Starting module: <module 'webui' from 'C:\StabilityMatrix\Data\Packages\SD.Next\webui.py'> 2024-07-16 09:35:37,397 | sd | INFO | launch | Command line args: ['--medvram', '--autolaunch', '--use-directml'] medvram=True autolaunch=True use_directml=True 2024-07-16 09:35:37,399 | sd | DEBUG | launch | Env flags: [] 2024-07-16 09:37:38,790 | sd | INFO | loader | Load packages: {'torch': '2.3.1+cpu', 'diffusers': '0.29.1', 'gradio': '3.43.2'} 2024-07-16 09:37:42,767 | sd | DEBUG | shared | Read: file="config.json" json=35 bytes=1548 time=0.000 2024-07-16 09:37:42,821 | sd | INFO | shared | Engine: backend=Backend.DIFFUSERS compute=directml device=privateuseone:0 attention="Dynamic Attention BMM" mode=no_grad 2024-07-16 09:37:42,979 | sd | INFO | shared | Device: device=AMD Radeon RX 6600M n=1 directml=0.2.2.dev240614 2024-07-16 09:37:42,987 | sd | DEBUG | shared | Read: file="html\reference.json" json=45 bytes=25986 time=0.006 2024-07-16 09:38:04,704 | sd | DEBUG | init | ONNX: version=1.18.1 provider=DmlExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']

Relevant log output

Text-to-Video
Model: Potat v1
12:47:35-275745 ERROR    Arguments: args=('task(c5jnmnvhq3xjo9w)', 'woman,    
                         sitting on couch, female curvy, detailed face,        
                         perfect face, correct eyes, hairstyles, detailed     
                         muzzle, detailed mouth, five fingers, proper hands,   
                         proper shading, proper lighting, detailed character,  
                         high quality,', 'worst quality, bad quality, (text),  
                         ((signature, watermark)), extra limb, deformed hands, 
                         deformed feet, multiple tails, deformed, disfigured,  
                         poorly drawn face, mutated, extra limb, ugly, face out
                         of frame, oversaturated, sketch, comic, no pupils,    
                         simple background, ((blurry)), mutation, intersex, bad
                         anatomy, disfigured,', [], 20, 0, 26, True, False,    
                         False, False, 1, 1, 6, 6, 0.7, 0, 0.5, 1, 1, -1.0,    
                         -1.0, 0, 0, 0, 512, 512, False, 0.3, 2, 'None', False,
                         20, 0, 0, 10, 0, '', '', 0, 0, 0, 0, False, 4, 0.95,  
                         False, 0.6, 1, '#000000', 0, [], 11, 1, 'None',       
                         'None', 'None', 'None', 0.5, 0.5, 0.5, 0.5, None,     
                         None, None, None, 0, 0, 0, 0, 1, 1, 1, 1, None, None, 
                         None, None, False, '', 'None', 16, 'None', 1, True,   
                         'None', 2, True, 1, 0, True, 'none', 3, 4, 0.25, 0.25,
                         3, 1, 1, 0.8, 8, 64, True, True, 0.5, 600.0, 1.0, 1,  
                         1, 0.5, 0.5, 'OpenGVLab/InternVL-14B-224px', False,   
                         False, 'positive', 'comma', 0, False, False, '',      
                         'None', '', 1, '', 'None', 1, True, 10, 'Potat v1',   
                         True, 24, 'GIF', 2, True, 1, 0, 0, '', [], 0, '', [], 
                         0, '', [], False, True, False, False, False, False, 0,
                         'None', [], 'FaceID Base', True, True, 1, 1, 1, 0.5,  
                         False, 'person', 1, 0.5, True) kwargs={}              
12:47:35-284260 ERROR    gradio call: AttributeError                           
┌───────────────────── Traceback (most recent call last) ─────────────────────┐
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\call_queue.py:31 in f      │
│                                                                             │
│   30 │   │   │   try:                                                       │
│ > 31 │   │   │   │   res = func(*args, **kwargs)                            │
│   32 │   │   │   │   progress.record_results(id_task, res)                  │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\txt2img.py:89 in txt2img   │
│                                                                             │
│   88 │   p.script_args = args                                               │
│ > 89 │   processed = scripts.scripts_txt2img.run(p, *args)                  │
│   90 │   if processed is None:                                              │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\scripts.py:483 in run      │
│                                                                             │
│   482 │   │   parsed = p.per_script_args.get(script.title(), args[script.ar │
│ > 483 │   │   processed = script.run(p, *parsed)                            │
│   484 │   │   s.record(script.title())                                      │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\scripts\text2video.py:88 in run    │
│                                                                             │
│    87 │   │   │   shared.opts.sd_model_checkpoint = checkpoint              │
│ >  88 │   │   │   sd_models.reload_model_weights(op='model')                │
│    89                                                                       │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\sd_models.py:1572 in reloa │
│                                                                             │
│   1571 │   from modules import lowvram, sd_hijack                           │
│ > 1572 │   checkpoint_info = info or select_checkpoint(op=op) # are we sele │
│   1573 │   next_checkpoint_info = info or select_checkpoint(op='dict' if lo │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\sd_models.py:248 in select │
│                                                                             │
│    247 │   │   return None                                                  │
│ >  248 │   checkpoint_info = get_closet_checkpoint_match(model_checkpoint)  │
│    249 │   if checkpoint_info is not None:                                  │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\sd_models.py:197 in get_cl │
│                                                                             │
│    196 def get_closet_checkpoint_match(search_string):                      │
│ >  197 │   if search_string.startswith('huggingface/'):                     │
│    198 │   │   model_name = search_string.replace('huggingface/', '')       │
└─────────────────────────────────────────────────────────────────────────────┘
AttributeError: 'CheckpointInfo' object has no attribute 'startswith'

Text-to-Video
Model: ZeroScope v2 Dark
12:50:00-451738 ERROR    Arguments: args=('task(yfgrwdtd3i1wg4r)', 'woman,        
                         sitting on couch, female curvy, detailed face,        
                         perfect face, correct eyes, hairstyles, detailed     
                         muzzle, detailed mouth, five fingers, proper hands,   
                         proper shading, proper lighting, detailed character,  
                         high quality,', 'worst quality, bad quality, (text),  
                         ((signature, watermark)), extra limb, deformed hands, 
                         deformed feet, multiple tails, deformed, disfigured,  
                         poorly drawn face, mutated, extra limb, ugly, face out
                         of frame, oversaturated, sketch, comic, no pupils,    
                         simple background, ((blurry)), mutation, intersex, bad
                         anatomy, disfigured,', [], 20, 7, 26, True, False,    
                         False, False, 1, 1, 6, 6, 0.7, 0, 0.5, 1, 1, -1.0,    
                         -1.0, 0, 0, 0, 512, 512, False, 0.3, 2, 'None', False,
                         20, 0, 0, 10, 0, '', '', 0, 0, 0, 0, False, 4, 0.95,  
                         False, 0.6, 1, '#000000', 0, [], 11, 1, 'None',       
                         'None', 'None', 'None', 0.5, 0.5, 0.5, 0.5, None,     
                         None, None, None, 0, 0, 0, 0, 1, 1, 1, 1, None, None, 
                         None, None, False, '', 'None', 16, 'None', 1, True,   
                         'None', 2, True, 1, 0, True, 'none', 3, 4, 0.25, 0.25,
                         3, 1, 1, 0.8, 8, 64, True, True, 0.5, 600.0, 1.0, 1,  
                         1, 0.5, 0.5, 'OpenGVLab/InternVL-14B-224px', False,   
                         False, 'positive', 'comma', 0, False, False, '',      
                         'None', '', 1, '', 'None', 1, True, 10, 'ZeroScope v2 
                         Dark', True, 24, 'GIF', 2, True, 1, 0, 0, '', [], 0,  
                         '', [], 0, '', [], False, True, False, False, False,  
                         False, 0, 'None', [], 'FaceID Base', True, True, 1, 1,
                         1, 0.5, False, 'person', 1, 0.5, True) kwargs={}      
12:50:00-459258 ERROR    gradio call: TypeError                                
┌───────────────────── Traceback (most recent call last) ─────────────────────┐
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\call_queue.py:31 in f      │
│                                                                             │
│   30 │   │   │   try:                                                       │
│ > 31 │   │   │   │   res = func(*args, **kwargs)                            │
│   32 │   │   │   │   progress.record_results(id_task, res)                  │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\txt2img.py:89 in txt2img   │
│                                                                             │
│   88 │   p.script_args = args                                               │
│ > 89 │   processed = scripts.scripts_txt2img.run(p, *args)                  │
│   90 │   if processed is None:                                              │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\scripts.py:483 in run      │
│                                                                             │
│   482 │   │   parsed = p.per_script_args.get(script.title(), args[script.ar │
│ > 483 │   │   processed = script.run(p, *parsed)                            │
│   484 │   │   s.record(script.title())                                      │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\scripts\text2video.py:75 in run    │
│                                                                             │
│    74 │   │                                                                 │
│ >  75 │   │   if model['path'] in shared.opts.sd_model_checkpoint:          │
│    76 │   │   │   shared.log.debug(f'Text2Video cached: model={shared.opts. │
└─────────────────────────────────────────────────────────────────────────────┘
TypeError: argument of type 'CheckpointInfo' is not iterable

Text-to-Video
Model: ModelScope 1.7b
13:02:06-745445 ERROR    Processing: args={'prompt': ['woman, sitting on      
                         couch, female curvy, detailed eyes, perfect eyes,     
                         detailed face, perfect face, perfectly rendered face, 
                         correct eyes, hairstyles, detailed muzzle, detailed   
                         mouth, five fingers, proper hands, proper shading,    
                         proper lighting, detailed character, high quality,'], 
                         'negative_prompt': ['worst quality, bad quality,      
                         (text), ((signature, watermark)), extra limb, deformed
                         hands, deformed feet, multiple tails, deformed,       
                         disfigured, poorly drawn face, mutated, extra limb,   
                         ugly, face out of frame, oversaturated, sketch, comic,
                         no pupils, simple background, ((blurry)), mutation,   
                         intersex, bad anatomy, disfigured,'],                 
                         'guidance_scale': 6, 'generator': [<torch._C.Generator
                         object at 0x0000017C89FBA530>], 'callback_steps': 1,  
                         'callback': <function diffusers_callback_legacy at    
                         0x0000017C8BF3ECB0>, 'num_inference_steps': 20, 'eta':
                         1.0, 'output_type': 'latent', 'width': 320, 'height': 
                         320, 'num_frames': 16} input must be 4-dimensional    
13:02:06-750699 ERROR    Processing: RuntimeError                              
┌───────────────────── Traceback (most recent call last) ─────────────────────┐
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\processing_diffusers.py:12 │
│                                                                             │
│   121 │   │   else:                                                         │
│ > 122 │   │   │   output = shared.sd_model(**base_args)                     │
│   123 │   │   if isinstance(output, dict):                                  │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\torch\utils │
│                                                                             │
│   114 │   │   with ctx_factory():                                           │
│ > 115 │   │   │   return func(*args, **kwargs)                              │
│   116                                                                       │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\diffusers\p │
│                                                                             │
│   596 │   │   │   │   # predict the noise residual                          │
│ > 597 │   │   │   │   noise_pred = self.unet(                               │
│   598 │   │   │   │   │   latent_model_input,                               │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\torch\nn\mo │
│                                                                             │
│   1531 │   │   else:                                                        │
│ > 1532 │   │   │   return self._call_impl(*args, **kwargs)                  │
│   1533                                                                      │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\torch\nn\mo │
│                                                                             │
│   1540 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hook │
│ > 1541 │   │   │   return forward_call(*args, **kwargs)                     │
│   1542                                                                      │
│                                                                             │
│                          ... 12 frames hidden ...                           │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\torch\nn\mo │
│                                                                             │
│   1540 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hook │
│ > 1541 │   │   │   return forward_call(*args, **kwargs)                     │
│   1542                                                                      │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\torch\nn\mo │
│                                                                             │
│    609 │   def forward(self, input: Tensor) -> Tensor:                      │
│ >  610 │   │   return self._conv_forward(input, self.weight, self.bias)     │
│    611                                                                      │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\torch\nn\mo │
│                                                                             │
│    604 │   │   │   )                                                        │
│ >  605 │   │   return F.conv3d(                                             │
│    606 │   │   │   input, weight, bias, self.stride, self.padding, self.dil │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\dml\amp\autocast_mode.py:4 │
│                                                                             │
│   42 │   │   op = getattr(resolved_obj, func_path[-1])                      │
│ > 43 │   │   setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: f │
│   44                                                                        │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\dml\amp\autocast_mode.py:1 │
│                                                                             │
│   14 │   if not torch.dml.is_autocast_enabled:                              │
│ > 15 │   │   return op(*args, **kwargs)                                     │
│   16 │   args = list(map(cast, args))                                       │
└─────────────────────────────────────────────────────────────────────────────┘
RuntimeError: input must be 4-dimensional

Image-to-Video
Model: VGen
13:08:33-673173 WARNING  Pipeline class change failed:                         
                         type=DiffusersTaskType.IMAGE_2_IMAGE                  
                         pipeline=I2VGenXLPipeline AutoPipeline can't find a   
                         pipeline linked to I2VGenXLPipeline for None          
13:08:34-378645 INFO     Base: class=I2VGenXLPipeline                          
13:08:47-883849 ERROR    Processing: args={'prompt': ['woman, sitting on      
                         couch, female curvy, detailed eyes, perfect eyes,     
                         detailed face, perfect face, perfectly rendered face, 
                         correct eyes, hairstyles, detailed muzzle, detailed   
                         mouth, five fingers, proper hands, proper shading,    
                         proper lighting, detailed character, high quality,'], 
                         'negative_prompt': ['worst quality, bad quality,      
                         (text), ((signature, watermark)), extra limb, deformed
                         hands, deformed feet, multiple tails, deformed,       
                         disfigured, poorly drawn face, mutated, extra limb,   
                         ugly, face out of frame, oversaturated, sketch, comic,
                         no pupils, simple background, ((blurry)), mutation,   
                         intersex, bad anatomy, disfigured,'],                 
                         'guidance_scale': 6, 'generator': [<torch._C.Generator
                         object at 0x0000026E161C7150>], 'num_inference_steps':
                         20, 'eta': 1.0, 'output_type': 'pil', 'width': 512,   
                         'height': 512, 'image': <PIL.Image.Image image        
                         mode=RGB size=512x512 at 0x26E118AE500>, 'num_frames':
                         16, 'target_fps': 8, 'decode_chunk_size': 8} the      
                         dimesion of at::Tensor must be 4 or lower, but got 5  
13:08:47-888378 ERROR    Processing: RuntimeError                              
┌───────────────────── Traceback (most recent call last) ─────────────────────┐
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\processing_diffusers.py:12 │
│                                                                             │
│   121 │   │   else:                                                         │
│ > 122 │   │   │   output = shared.sd_model(**base_args)                     │
│   123 │   │   if isinstance(output, dict):                                  │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\torch\utils │
│                                                                             │
│   114 │   │   with ctx_factory():                                           │
│ > 115 │   │   │   return func(*args, **kwargs)                              │
│   116                                                                       │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\diffusers\p │
│                                                                             │
│   639 │   │   image = self.video_processor.preprocess(resized_image).to(dev │
│ > 640 │   │   image_latents = self.prepare_image_latents(                   │
│   641 │   │   │   image,                                                    │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\diffusers\p │
│                                                                             │
│   465 │   │   # duplicate image_latents for each generation per prompt, usi │
│ > 466 │   │   image_latents = image_latents.repeat(num_videos_per_prompt, 1 │
│   467                                                                       │
└─────────────────────────────────────────────────────────────────────────────┘
RuntimeError: the dimesion of at::Tensor must be 4 or lower, but got 5

Stable Video Diffusion
Model: SVD XT 1.1
13:12:03-607975 ERROR    Processing: args={'generator': <torch._C.Generator    
                         object at 0x000001F873C34810>, 'callback_on_step_end':
                         <function diffusers_callback at 0x000001F84F665D80>,  
                         'callback_on_step_end_tensor_inputs': ['latents'],    
                         'num_inference_steps': 20, 'output_type': 'pil',      
                         'image': <PIL.Image.Image image mode=RGB size=1024x576
                         at 0x1F8531FC610>, 'width': 1024, 'height': 576,      
                         'num_frames': 14, 'decode_chunk_size': 6,             
                         'motion_bucket_id': 128, 'noise_aug_strength': 0.1,   
                         'min_guidance_scale': 1, 'max_guidance_scale': 3} the 
                         dimesion of at::Tensor must be 4 or lower, but got 5  
13:12:03-611978 ERROR    Processing: RuntimeError                              
┌───────────────────── Traceback (most recent call last) ─────────────────────┐
│ C:\StabilityMatrix\Data\Packages\SD.Next\modules\processing_diffusers.py:12 │
│                                                                             │
│   121 │   │   else:                                                         │
│ > 122 │   │   │   output = shared.sd_model(**base_args)                     │
│   123 │   │   if isinstance(output, dict):                                  │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\torch\utils │
│                                                                             │
│   114 │   │   with ctx_factory():                                           │
│ > 115 │   │   │   return func(*args, **kwargs)                              │
│   116                                                                       │
│                                                                             │
│ C:\StabilityMatrix\Data\Packages\SD.Next\venv\lib\site-packages\diffusers\p │
│                                                                             │
│   523 │   │   # image_latents [batch, channels, height, width] ->[batch, nu │
│ > 524 │   │   image_latents = image_latents.unsqueeze(1).repeat(1, num_fram │
│   525                                                                       │
└─────────────────────────────────────────────────────────────────────────────┘
RuntimeError: the dimesion of at::Tensor must be 4 or lower, but got 5
13:12:03-690490 WARNING  Pipeline class change failed:                         
                         type=DiffusersTaskType.TEXT_2_IMAGE                   
                         pipeline=StableVideoDiffusionPipeline AutoPipeline    
                         can't find a pipeline linked to                       
                         StableVideoDiffusionPipeline for None

Backend

Diffusers

UI

Standard

Branch

Master

Model

StableDiffusion 1.5

Acknowledgements

[X] I have read the above and searched for existing issues
[X] I confirm that this is classified correctly and its not an extension issue

genewitch commented 3 weeks ago

i am unsure of what directml is, appears to be an alternative to --use-cuda; but these don't work with cuda, either, same errors - not iterable, "startswith".

hope that helps tracking down the issue!

16:01:05-437841 DEBUG    Text2Video: model={'name': 'Potat v1', 'path': 'camenduru/potat1', 'params': [24, 1024, 576]}
                         defaults=True frames=21, video=MP4 duration=2 loop=True pad=1 interpolate=0
16:01:05-439840 ERROR    Exception: argument of type 'CheckpointInfo' is not iterable
16:01:05-440840 ERROR    Arguments: args=('task(4tglye2wwa8ff2n)', '', ' there are trees and foliage visible in the
                         background on the right side, sunlight casts shadows across the scene creating a contrast
                         between light and dark areas, photograph taken during daytime. <lora:chas-notext-42e:1.4>',
                         'child, kid, baby, infant, low quality, ugly', [], 70, 20, 31, True, False, False, False, 1, 1,
                         7, 6, 0.7, 0, 0.5, 1, 1, -1.0, -1.0, 0, 0, 0, 253, 450, False, 0.3, 1, 1, 'Add with forward',
                         'None', False, 20, 0, 0, 20, 0, '', '', 0, 0, 0, 0, False, 4, 0.95, False, 0.6, 1, '#000000',
                         0, [], 20, 1, 'None', 'None', 'None', 'None', 0.5, 0.5, 0.5, 0.5, None, None, None, None,
                         False, False, False, False, 0, 0, 0, 0, 1, 1, 1, 1, None, None, None, None, False, '', False,
                         0, '', [], 0, '', [], 0, '', [], False, True, False, True, False, False, False, 0, 'None', [],
                         'FaceID Base', True, True, 1, 1, 1, 0.5, True, 'person', 1, 0.5, True, 'None', 16, 'None', 1,
                         True, 'None', 2, True, 1, 0, True, 'none', 3, 4, 0.25, 0.25, 1, -0.5, 0, 'THUDM/CogVideoX-2b',
                         'DDIM', 49, 6, 'balanced', True, 'None', 8, True, 1, 0, None, None, '', 0.5, 5, None, '', 0.5,
                         5, None, 3, 1, 1, 0.8, 8, 64, True, 0.65, True, False, 1, 1, 1, '', True, 0.5, 600.0, 1.0,
                         True, None, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0.5, 0.5, 'OpenGVLab/InternVL-14B-224px', False,
                         0.7, 1.2, 128, False, False, 'positive', 'comma', 0, False, False, '', 'None', '', 1, '',
                         'None', 1, True, 10, 'Potat v1', True, 21, 'MP4', 2, True, 1, 0, 0, '', [], 0, '', [], 0, '',
                         [], False, True, False, True, False, False, False, 0) kwargs={}
16:01:05-445841 ERROR    gradio call: TypeError
╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ D:\opt\automatic\modules\call_queue.py:31 in f                                                                       │
│                                                                                                                      │
│   30 │   │   │   try:                                                                                                │
│ ❱ 31 │   │   │   │   res = func(*args, **kwargs)                                                                     │
│   32 │   │   │   │   progress.record_results(id_task, res)                                                           │
│                                                                                                                      │
│ D:\opt\automatic\modules\txt2img.py:92 in txt2img                                                                    │
│                                                                                                                      │
│    91 │   p.state = state                                                                                            │
│ ❱  92 │   processed = scripts.scripts_txt2img.run(p, *args)                                                          │
│    93 │   if processed is None:                                                                                      │
│                                                                                                                      │
│ D:\opt\automatic\modules\scripts.py:502 in run                                                                       │
│                                                                                                                      │
│   501 │   │   if hasattr(script, 'run'):                                                                             │
│ ❱ 502 │   │   │   processed = script.run(p, *parsed)                                                                 │
│   503 │   │   else:                                                                                                  │
│                                                                                                                      │
│ D:\opt\automatic\scripts\text2video.py:75 in run                                                                     │
│                                                                                                                      │
│    74 │   │                                                                                                          │
│ ❱  75 │   │   if model['path'] in shared.opts.sd_model_checkpoint:                                                   │
│    76 │   │   │   shared.log.debug(f'Text2Video cached: model={shared.opts.sd_model_checkpoint}')                    │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: argument of type 'CheckpointInfo' is not iterable

vladmandic commented 3 weeks ago

@genewitch i've fixed your issue, but this is totally different - please check when posting issues. a) error is completely different, b) model used is completely different. issue is about svd, you're posting about t2v script.

genewitch commented 3 weeks ago

I don't really want to argue. i tried all three of the text-to-video that the OP said had that error and saw the same "CheckpointInfo is not iterable" as well as the error ending with "object has no attribute 'startswith'".

These diverge, i assume, because of the different backends or whatever, directml vs cuda, but the error has the same verbiage and lines 31 and 75 are in common in errors.

Sorry. I was trying to give context.

vladmandic commented 3 weeks ago

original error is not in the same module, even if the error may look the same to you. issue was created for SVD - you're tring T2V - very different. anyhow, its fixed in dev branch and will be included in the next release.

vladmandic / automatic