hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
21.63k stars 2.09k forks source link

推理得到的图像或者视频都有问题 #369

Closed HioZx closed 4 months ago

HioZx commented 4 months ago

sample_0 都是长这样,不知道是什么情况

JamesTensor commented 4 months ago

me too

zhengzangw commented 4 months ago

This is not excepted. Please pull the latest repo and run it again. If this is still the case, please provide your running command, and environment for us to investigate it.

HioZx commented 4 months ago

command:python scripts/inference.py configs/opensora-v1-1/inference/sample.py --ckpt-path /home/hio/code/STDiT2/model.safetensors --prompt "A beautiful sunset over the city" --num-frames 1 --image-size 512 512

environment: absl-py 2.1.0 accelerate 0.29.1 addict 2.4.0 aiosignal 1.3.1 annotated-types 0.6.0 anykeystore 0.2 appdirs 1.4.4 attrs 23.2.0 bcrypt 4.1.2 beartype 0.18.5 beautifulsoup4 4.12.3 certifi 2022.12.7 cffi 1.16.0 cfgv 3.4.0 charset-normalizer 2.1.1 click 8.1.7 cmake 3.25.0 colossalai 0.3.6 contexttimer 0.3.3 contourpy 1.2.1 cryptacular 1.6.2 cryptography 42.0.5 cycler 0.12.1 decorator 5.1.1 defusedxml 0.7.1 Deprecated 1.2.14 diffusers 0.27.2 dill 0.3.8 distlib 0.3.8 docker-pycreds 0.4.0 einops 0.7.0 fabric 3.2.2 filelock 3.13.3 flash-attn 2.5.8 fonttools 4.51.0 frozenlist 1.4.1 fsspec 2024.3.1 ftfy 6.2.0 gdown 5.1.0 gitdb 4.0.11 GitPython 3.1.43 google 3.0.0 greenlet 3.0.3 grpcio 1.62.1 huggingface-hub 0.22.2 hupper 1.12.1 identify 2.5.35 idna 3.4 importlib_metadata 7.1.0 invoke 2.2.0 Jinja2 3.1.2 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 lit 15.0.7 Markdown 3.6 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.8.4 mdurl 0.1.2 mmengine 0.10.3 mpmath 1.3.0 msgpack 1.0.8 networkx 3.2.1 ninja 1.11.1.1 nodeenv 1.8.0 numpy 1.26.3 nvidia-cublas-cu11 11.11.3.6 nvidia-cuda-cupti-cu11 11.8.87 nvidia-cuda-nvrtc-cu11 11.8.89 nvidia-cuda-runtime-cu11 11.8.89 nvidia-cudnn-cu11 8.7.0.84 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.3.0.86 nvidia-cusolver-cu11 11.4.1.48 nvidia-cusparse-cu11 11.7.5.86 nvidia-nccl-cu11 2.19.3 nvidia-nvtx-cu11 11.8.86 oauthlib 3.2.2 opencv-python 4.9.0.80 opensora 1.1.0 packaging 24.0 pandarallel 1.6.5 pandas 2.2.2 paramiko 3.4.0 PasteDeploy 3.1.0 pbkdf2 1.3 pillow 10.2.0 pip 22.3.1 plaster 1.1.2 plaster-pastedeploy 1.0.1 platformdirs 4.2.0 pre-commit 3.7.0 protobuf 4.25.3 psutil 5.9.8 pyarrow 16.0.0 pyav 12.0.5 pycparser 2.22 pydantic 2.6.4 pydantic_core 2.16.3 Pygments 2.17.2 PyNaCl 1.5.0 pyparsing 3.1.2 pyramid 2.0.2 pyramid-mailer 0.15.1 PySocks 1.7.1 python-dateutil 2.9.0.post0 python3-openid 3.2.0 pytz 2024.1 PyYAML 6.0.1 ray 2.10.0 referencing 0.34.0 regex 2023.12.25 repoze.sendmail 4.4.1 requests 2.28.1 requests-oauthlib 2.0.0 rich 13.7.1 rotary-embedding-torch 0.5.3 rpds-py 0.18.0 safetensors 0.4.2 sentencepiece 0.2.0 sentry-sdk 1.44.1 setproctitle 1.3.3 setuptools 65.5.1 six 1.16.0 smmap 5.0.1 soupsieve 2.5 SQLAlchemy 2.0.29 sympy 1.12 tensorboard 2.16.2 tensorboard-data-server 0.7.2 termcolor 2.4.0 timm 0.9.16 tokenizers 0.15.2 tomli 2.0.1 torch 2.2.2+cu118 torchaudio 2.2.2+cu118 torchvision 0.17.2+cu118 tqdm 4.66.2 transaction 4.0 transformers 4.39.3 translationstring 1.4 triton 2.2.0 typing_extensions 4.8.0 tzdata 2024.1 urllib3 1.26.13 velruse 1.1.1 venusian 3.1.0 virtualenv 20.25.1 wandb 0.16.6 wcwidth 0.2.13 WebOb 1.8.7 Werkzeug 3.0.2 wheel 0.38.4 wrapt 1.16.0 WTForms 3.1.2 wtforms-recaptcha 0.3.2 xformers 0.0.25.post1+cu118 yapf 0.40.2 zipp 3.18.1 zope.deprecation 5.0 zope.interface 6.2 zope.sqlalchemy 3.1

I changed the parameters:enable_flashattn and enable_layernorm_kernel configs/opensora-v1-1/inference/sample.py: `num_frames = 16 frame_interval = 3 fps = 24 image_size = (240, 426) multi_resolution = "STDiT2"

model = dict( type="STDiT2-XL/2", from_pretrained=None, input_sq_size=512, qk_norm=True, enable_flashattn=False, enable_layernorm_kernel=False, ) vae = dict( type="VideoAutoencoderKL", from_pretrained="/home/hio/code/sd-vae-ft-ema",

cache_dir=None, # "/mnt/hdd/cached_models",

micro_batch_size=4,

) text_encoder = dict( type="t5", from_pretrained="/home/hio/code/t5-v1_1-xxl",

cache_dir=None, # "/mnt/hdd/cached_models",

model_max_length=200,

) scheduler = dict( type="iddpm", num_sampling_steps=100, cfg_scale=7.0, cfg_channel=3, # or None ) dtype = "fp16"

prompt_path = "./assets/texts/t2v_samples.txt" prompt = None # prompt has higher priority than prompt_path

batch_size = 1 seed = 42 save_dir = "./samples/samples/" `

zhengzangw commented 4 months ago

I think you do not pull the latest version as the latest version has enable_flash_attn instead of enable_flashattn.

Besides, we do not use --ckpt-path /home/hio/code/STDiT2/model.safetensors. You can try not passing --ckpt-path and our code now enable automatic downloading.

buxianggaimingzi commented 4 months ago

btw, enable_flashattn in the training config has not been changed to enable_flash_attn in the latest version, it may cause OOM during training

zhengzangw commented 4 months ago

https://github.com/hpcaitech/Open-Sora/blob/c6cc021d612456455addc1b3164c1d43eeb33b97/configs/opensora-v1-1/inference/sample.py#L13

But here we does use enable_flash_attn

buxianggaimingzi commented 4 months ago

There may be some misunderstanding. I mean the training configuration has not been modified yet: https://github.com/hpcaitech/Open-Sora/blob/c6cc021d612456455addc1b3164c1d43eeb33b97/configs/opensora-v1-1/train/stage3.py#L46 There is no problem with the inference configuration. It’s just that I saw the field enable_flash_attn mentioned here, so I mentioned it by the way.

HioZx commented 4 months ago

Pulling the latest warehouse is the same result. I didn't install apex, so enable_flash_attn and enable_layernorm_kernel were disabled, but that shouldn't have caused the failure

xunshengliuyin commented 4 months ago

me too

xunshengliuyin commented 3 months ago

@HioZx hello,Have you solved this problem?

HioZx commented 3 months ago

@HioZx你好,你解决这个问题了吗?

No, I don't know why