Open mymusise opened 1 month ago
For anime style, I haven't really gotten anything much better. Best one so far is just something like this:
https://github.com/user-attachments/assets/2bf9b38b-df99-4805-9f4e-7de39c81a02e
@kijai how does one output only the video portion (block on the right side)? my I2V gens using the workflow also create this side-by-side video but I'd like to have only the video portion (without static image on the left)
@kijai what settings do you use for img2vid? when I use defaults, it usually ends in a bunch of artifacts, or is that normal? Also, does any resolution work? or how did you get it to work if it isn't 1280x768?
@kijai how does one output only the video portion (block on the right side)? my I2V gens using the workflow also create this side-by-side video but I'd like to have only the video portion (without static image on the left)
OK I solved this by bypassing entirely the Image Concat node—perhaps a suggestion would be to place a switch to turn on/off the side-by-side format (ie, concatenate image with video, yes/no).
@lijackcoder the workflow worked out-of-the-box for me, make sure to download the correct models/ckpt as well as controlnet and place them in the appropriate folders. I didn't get any artifacts, however the gens so far are less stable/usable than CogVideo I2V.
@mptorr Apparently there was an updated 14 hours ago for the both the 384p and 768p models. I downloaded the versions before that update. No idea if it will make a difference or not, but I will test it out.
But my question is what do you mean with controlnet?
as well as controlnet
@mptorr Apparently there was an updated 14 hours ago for the both the 384p and 768p models. I downloaded the versions before that update. No idea if it will make a difference or not, but I will test it out.
But my question is what do you mean with controlnet?
as well as controlnet
it's a component that's necessary to run this workflow... you must follow the instructions @kijai indicates on the main page, and have this directory structure within your custom-nodes
folder. Note that doing only git clone
will usually not download the larger files, you can do that manually or need to install git LFS.
\ComfyUI\models\pyramidflow\pyramid-flow-sd3 ├───causal_video_vae │ config.json │ diffusion_pytorch_model.safetensors │ ├───diffusion_transformer_384p │ config.json │ diffusion_pytorch_model.safetensors │ ├───diffusion_transformer_768p │ config.json │ diffusion_pytorch_model.safetensors │ ├───text_encoder │ config.json │ model.safetensors │ ├───text_encoder_2 │ config.json │ model.safetensors │ ├───text_encoder_3 │ config.json │ model-00001-of-00002.safetensors │ model-00002-of-00002.safetensors │ model.safetensors.index.json │ ├───tokenizer │ merges.txt │ special_tokens_map.json │ tokenizer_config.json │ vocab.json │ ├───tokenizer_2 │ merges.txt │ special_tokens_map.json │ tokenizer_config.json │ vocab.json │ └───tokenizer_3 special_tokens_map.json spiece.model tokenizer.json tokenizer_config.json
okay thanks
I just let the model downloader "from the workflow" do it's work, thought it seems it is working fine for me by doing that for I2V, it's running on only 8.7 GB Vram, so i guess i F* something up? i have no idea... just leaving this here because tomorrow i'll have time to actually test if it does anything better than CogVideo text to video/i2v etc. it's running at 20/it tho, kinda slow.
I just let the model downloader "from the workflow" do it's work, thought it seems it is working fine for me by doing that for I2V, it's running on only 8.7 GB Vram, so i guess i F* something up? i have no idea... just leaving this here because tomorrow i'll have time to actually test if it does anything better than CogVideo text to video/i2v etc. it's running at 20/it tho, kinda slow.
That's expected VRAM use at the beginning, it ramps up a bit as the model works in stages.
I just let the model downloader "from the workflow" do it's work, thought it seems it is working fine for me by doing that for I2V, it's running on only 8.7 GB Vram, so i guess i F* something up? i have no idea... just leaving this here because tomorrow i'll have time to actually test if it does anything better than CogVideo text to video/i2v etc. it's running at 20/it tho, kinda slow.
FWIW I get 13.8s / it on a local GTX 4090, when doing I2V (the first gen is always slower and gives 17s / it).
FWIW I get 13.8s / it on a local GTX 4090, when doing I2V (the first gen is always slower and gives 17s / it).
yeah i noticed that just now, first gen starts at 20s / it and goes to 28s / it on my 4070ti super first gen, the next one it starts at 16s / it. but the results are of high quality in terms of image sharpness, not much luck with guiding any motion with prompt for i2V for now, i'll leave it at that for today.
kijai said the model works in stages, and i can literally hear that as the wind turbines on my card go wild, by the way it has 4 fans lol, gladly i have a undervolt preset.
FWIW I get 13.8s / it on a local GTX 4090, when doing I2V (the first gen is always slower and gives 17s / it).
yeah i noticed that just now, first gen starts at 20s / it and goes to 28s / it on my 4070ti super first gen, the next one it starts at 16s / it. but the results are of high quality in terms of image sharpness, not much luck with guiding any motion with prompt for i2V for now, i'll leave it at that for today.
kijai said the model works in stages, and i can literally hear that as the wind turbines on my card go wild, by the way it has 4 fans lol, gladly i have a undervolt preset.
I can hear what stage it's on from the coil whine of the GPU alone.. it's also possible to set the steps separately for different stages (3 of them by default), can cut down the sampling time a lot, unsure which way is better though, it felt like it's better to use less steps at earlier stages and more on latter.
FWIW I get 13.8s / it on a local GTX 4090, when doing I2V (the first gen is always slower and gives 17s / it).
yeah i noticed that just now, first gen starts at 20s / it and goes to 28s / it on my 4070ti super first gen, the next one it starts at 16s / it. but the results are of high quality in terms of image sharpness, not much luck with guiding any motion with prompt for i2V for now, i'll leave it at that for today. kijai said the model works in stages, and i can literally hear that as the wind turbines on my card go wild, by the way it has 4 fans lol, gladly i have a undervolt preset.
I can hear what stage it's on from the coil whine of the GPU alone.. it's also possible to set the steps separately for different stages (3 of them by default), can cut down the sampling time a lot, unsure which way is better though, it felt like it's better to use less steps at earlier stages and more on latter.
How long does a generation for you usually take?
it's not video, it's flash animate back in 2000s
Is this effect normal for Image2Video?
prompt: a girl with blue eyes standing in front of a backdrop of trees, water, and a clear blue sky. She is wearing a t-shirt and shorts, and her face is illuminated by the sun. hyper quality, Ultra HD, 8K
https://github.com/user-attachments/assets/bd9706b9-633e-4d6c-b618-0fc09d30d951