Closed Scruntee closed 1 year ago
I'm also seeing the same issues. Adds audio, even when add soundtrack is set to "none," and there is only the video file, no individual frames saved.
I only get 16 frames, no matter how many I specify.
Same issues. I have a post in 'Discussions' where I'm having problems with Docker so I installed everything the regular way. Tries VideoCrafter and get the soundtrack added when unwanted and generates just one second of frames.
I am having the same results as jonseed and rookiemann I am using Automatic1111 on windows10 with 3090 external GPU.
To fix the audio being added, just comment out this line in the text2vid.py in the "process_vidcrafter" function:
add_soundtrack(ffmpeg_location, fps, os.path.join(outdir_current, f"vid.mp4"), 0, -1, None, add_soundtrack, soundtrack_path, ffmpeg_crf, ffmpeg_preset)
Thank you BillarySquintin that took care of the secondary issue, do you have any thoughts on what we might be able to change in order to extend the frame limit beyond 12 frames and hopefully be able to get access to the image sequence as well? Thank you again.
I've updated to new version and still have the problem with adding soundtrack. I had to go text2vid.py file to comment completely out the if statement at line 399. Also still only about a second of frames, the output directory only has the video and not the images. For the one second of frames that might be something in the VideoCrafter scripts I think, no?
After a quick glance before bed, I'm struggling to find where if at all the number of frames is explicitly being requested from videocrafter... but it definitely doesn't look like the torch size or whatever is being setup outside its default configuration... will look into it more later...
edit: looks like no args from the ui are being passed to videocrafter besides prompt and cfg.
*maybe steps works as well, possibly eta not sure( output is too garbage to really tell if anything changes).
i finded that also the resolution is not the one that you input, all the videos are in the 32 output, even tho the minimum in the ui is 64. the file from everything is requested from is in stable-diffusion-webui\extensions\sd-webui-text2video\scripts\videocrafter\base_t2v
from the line 11 to 13 you can set the video res
` image_size:
i changed it to -64 -64 and it worked
the line 9 goes:
video_length: 16
i changed it to 24, 32, 64, and doesnt work, keeps generating 16
and the line 45
temporal_length: 16
i also changed it but this one creates errors
size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2_tmp.relative_position_v.embeddings_table: copying a param with shape torch.Size([xx, xx]) from checkpoint, the shape in current model is torch.Size([xx, xx]).
you can put 17 or any other number and it just goes deadass
i tried modifying only the line 9 or only the line 45 and both at the same time and it doesnt work fr
Were also behind in our script as apparently ours doesn't support variable video length? Idk what they meant by that in the original commit but their sample_text2video.py at line 41 parser.add_argument("--num_frames", type=int, default=16, help="number of input frames")
Tried working through but got stuck on noise_shape = make_model_input_shape(model, batch_size, T=num_frames) undefined error, though it is defined elsewhere and imported dunno...
I managed to get the frames setting to work by updating to the latest scripts from the VideoCrafter repo and then editing process_videocrafter.py as follow from line 62.
samples = sample_text2video(model, args.prompt, 1, 1,
sample_type="ddim", sampler=ddim_sampler,
ddim_steps=args.steps, eta=args.eta,
cfg_scale=args.cfg_scale,
decode_frame_bs=1,
ddp=False, show_denoising_progress=False,
num_frames=args.frames,
)
Sorry, I have no github skills so I don't know how to mark this up properly.
After adding num_frame=args.frames, the slider works for setting the number of frames.
I also had to remove the args.n_prompt argument. The new sample_text2video.py script from VideoCrafter doesn't seem to have that argument (maybe I'm missing something!)
BTW, I haven't tried adding num_frames=args.frames to the original process_videocrafter.py script from this repo - maybe that works too!
This is about as far as I got this morning:
File "D:\NasD\stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts\t2v_helpers\render.py", line 26, in run vids_pack = process_videocrafter(args_dict) File "D:\NasD\stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts\videocrafter\process_videocrafter.py", line 61, in process_videocrafter samples = sample_text2video(model, args.prompt, 1, 1,# todo:add batch size support File "D:\NasD\stable-diffusion-webui\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) TypeError: sample_text2video() got multiple values for argument 'sample_type' Exception occurred: sample_text2video() got multiple values for argument 'sample_type'
Changing line 62 in the included script throws out an unexpected keyword argument 'num_frames'
Nevermind I'm an idiot: replace LVDM with new LVDM, replace scripts with new scripts, fix imports in sample_text2video.py at line 12, just copy the old ones over, at line 12, frames generation is fixed, major props to pmonck.
However... the temporal consistency seems to be all over the place? Sort of like a slideshow, no matter the amount of frames or fps I generate something feels very 'off' about the generated output, it completely seems to lack the smooth motion from the basic modelscope model.
So you updated the base files from their repo, right?
Yes, I grabbed the updated files from the VideoCrafter repo and made edits. It looks like that is going to be a necessary step in getting control over the number of frames (as well as all the other new features).
Something still feels real off about it, been trying to mess with the configs before work, not making much headway.
Compared to original modelscope:
I'm seeing the same issue with the lack of smooth motion. Also, note that the new version of sample_text2video.py doesn't seem to support negative prompts in the sample_text2video() function. I just removed the argument altogether in order to get it working.
I'm actually not sure if weight are actually working as intended across the board at this point either, but it could just be the poor quality models we have to work with.
Updated videocrafter: now it allows variable length and you can control whether to add the soundtrack
Is there an existing issue for this?
Are you using the latest version of the extension?
What happened?
Generating a video regardless of number of frames and video frame rate makes a 1 second video with audio even with audio ticked off. Also I only see the video file saved and not the frames for the video as the previous ModelScope video did.
Steps to reproduce the problem
What should have happened?
No response
WebUI and Deforum extension Commit IDs
webui commit id - commit: 226d840e txt2vid commit id -
What GPU were you using for launching?
2080 super
On which platform are you launching the webui backend with the extension?
No response
Settings
Console logs
Additional information
No response