showlab / Tune-A-Video

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
https://tuneavideo.github.io
Apache License 2.0
4.22k stars 384 forks source link

How many training steps are required to achieve the effect in the sample? #15

Closed arceus-jia closed 1 year ago

arceus-jia commented 1 year ago

I tried the 100,000 steps training, but the results still look strange, is this normal? sample-100000 Can you tell me how many steps I need to take to achieve the right result? Thank you!

zhangjiewu commented 1 year ago

The results look weird, like the model is not trained. It usually takes 300~500 steps to train on an 8-frame video. Can you provide more info (e.g, environment, code snippets) for me to look into this issue?

arceus-jia commented 1 year ago

Well, I'm not sure if it's the xformers version conflicts, but after I reinstalled the environment and upgraded torch to 13.1 , torchvision to 0.14.1 and installed the latest xformers version , the retraining result is fine. Anyway, thank you!

zhangjiewu commented 1 year ago

Glad to hear that. Let me know if you have any other question. :)

liangbingzhao commented 1 year ago

can u share your results after running python -m xformers.info? I construct a new virtual environment, with torch1.13-cu117+torchvision0.14, but after I install xformers with command pip install -U xformers, module triton is not installed. I ran pip install triton, making it installed. But the results of this repo are still like yours. Wonder how to fix it?

arceus-jia commented 1 year ago

can u share your results after running python -m xformers.info? I construct a new virtual environment, with torch1.13-cu117+torchvision0.14, but after I install xformers with command pip install -U xformers, module triton is not installed. I ran pip install triton, making it installed. But the results of this repo are still like yours. Wonder how to fix it?

here is my environment, ,you can refer to it and compare it with yours

absl-py==1.4.0 accelerate==0.16.0 antlr4-python3-runtime==4.9.3 bitsandbytes==0.35.4 cachetools==5.3.0 certifi @ file:///croot/certifi_1671487769961/work/certifi cffi @ file:///tmp/abs_98z5h56wf8/croots/recipe/cffi_1659598650955/work charset-normalizer==3.0.1 decord==0.6.0 diffusers==0.11.1 einops==0.6.0 filelock==3.9.0 flit_core @ file:///opt/conda/conda-bld/flit-core_1644941570762/work/source/flit_core ftfy==6.1.1 future @ file:///home/builder/ci_310/future_1640790123501/work google-auth==2.16.0 google-auth-oauthlib==0.4.6 grpcio==1.51.1 huggingface-hub==0.12.0 idna==3.4 imageio==2.25.0 importlib-metadata==6.0.0 Jinja2==3.1.2 Markdown==3.4.1 MarkupSafe==2.1.2 mkl-fft==1.3.1 mkl-random @ file:///home/builder/ci_310/mkl_random_1641843545607/work mkl-service==2.4.0 modelcards==0.1.6 mypy-extensions==1.0.0 numpy @ file:///croot/numpy_and_numpy_base_1672336185480/work nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 oauthlib==3.2.2 omegaconf==2.3.0 packaging==23.0 Pillow==9.4.0 protobuf==3.20.3 psutil==5.9.4 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work pyre-extensions==0.0.23 PyYAML @ file:///croot/pyyaml_1670514731622/work regex==2022.10.31 requests==2.28.2 requests-oauthlib==1.3.1 rsa==4.9 six @ file:///tmp/build/80754af9/six_1644875935023/work tensorboard==2.11.2 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tokenizers==0.13.2 torch==1.13.1 torchvision==0.14.1 tqdm==4.64.1 transformers==4.26.0 typing-inspect==0.8.0 typing_extensions @ file:///croot/typing_extensions_1669924550328/work urllib3==1.26.14 wcwidth==0.2.6 Werkzeug==2.2.2 xformers==0.0.17.dev444 zipp==3.12.1

liangbingzhao commented 1 year ago

Thank you for your response. I upgrade xformers from 0.0.16 to 0.0.17. Upgraded model generates as follows:

sample-300

This seems better? But many discordances exist.

arceus-jia commented 1 year ago

This seems better? But many discordances exist.

Yep, that means the training was successful. In fact the sample given by the author is similar to this one. The author mainly provide an idea for ai-generated animation with diffusion model, but if you want to productize it, it still needs a lot of improvement