sign-language-processing / pose-to-video

Render pose sequences as photorealistic videos.
8 stars 6 forks source link

Custom dataset loading problem when trying to train Control Net #9

Closed sparkkid1234 closed 3 months ago

sparkkid1234 commented 3 months ago

Hi @AmitMY , I'm getting errors when trying to run the train_controlnet.py script following the steps from the train.sh file.

I'm at a point where I have the frames512.zip and poses512.zip files for my custom dataset (which is just a single video and corresponding .pose file for now). I then run dataset.py script in the controlnet folder to prepare these into huggingface dataset and the script wrote all necessary files to HF_DATASET_DIR. However, when I try running ! accelerate launch diffusers/examples/controlnet/train_controlnet.py ... with --training_data_dir="$HF_DATASET_DIR", I get the following errors:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/root/pose_to_video/pose-to-video/pose_to_video/conditional/controlnet/diffusers/examples/controlnet/train_controlnet.py", line 1187, in <module>
[rank0]:     main(args)
[rank0]:   File "/root/pose_to_video/pose-to-video/pose_to_video/conditional/controlnet/diffusers/examples/controlnet/train_controlnet.py", line 923, in main
[rank0]:     train_dataset = make_train_dataset(args, tokenizer, accelerator)
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/pose_to_video/pose-to-video/pose_to_video/conditional/controlnet/diffusers/examples/controlnet/train_controlnet.py", line 639, in make_train_dataset
[rank0]:     raise ValueError(
[rank0]: ValueError: `--image_column` value 'image' not found in dataset columns. Dataset columns are: _data_files, _fingerprint, _format_columns, _format_kwargs, _format_type, _output_all_columns, _split

Would you be able to advise? Thanks!

AmitMY commented 3 months ago

I would try to load the dataset with a huggingface dataset loader, and observe it, to see what is going on. Since the dataset should have these columns, and not the ones it says for you: https://github.com/sign-language-processing/pose-to-video/blob/main/pose_to_video/conditional/controlnet/dataset.py#L50-L54

sparkkid1234 commented 3 months ago

@AmitMY would you be able to share your datasets module version and what are the expected output files in HF_DATASET_DIR after I run the dataset.py script? Thanks!

AmitMY commented 3 months ago

The directory I processed was in scratch storage, and so it was removed a while back.

The environment I used:

name: diffusers
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - bzip2=1.0.8=h7b6447c_0
  - ca-certificates=2023.12.12=h06a4308_0
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.4.4=h6a678d5_0
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libstdcxx-ng=11.2.0=h1234567_1
  - libuuid=1.41.5=h5eee18b_0
  - ncurses=6.4=h6a678d5_0
  - openssl=3.0.12=h7f8727e_0
  - pip=23.3.1=py311h06a4308_0
  - python=3.11.5=h955ad1f_0
  - readline=8.2=h5eee18b_0
  - setuptools=68.2.2=py311h06a4308_0
  - sqlite=3.41.2=h5eee18b_0
  - tk=8.6.12=h1ccaba5_0
  - wheel=0.41.2=py311h06a4308_0
  - xz=5.4.5=h5eee18b_0
  - zlib=1.2.13=h5eee18b_0
  - pip:
      - absl-py==2.0.0
      - accelerate==0.25.0
      - aiohttp==3.9.1
      - aiosignal==1.3.1
      - appdirs==1.4.4
      - argparse==1.4.0
      - astunparse==1.6.3
      - attrs==23.1.0
      - cachetools==5.3.3
      - certifi==2023.11.17
      - cffi==1.16.0
      - charset-normalizer==3.3.2
      - click==8.1.7
      - contourpy==1.2.0
      - cycler==0.12.1
      - datasets==2.15.0
      - diffusers==0.26.3
      - dill==0.3.7
      - docker-pycreds==0.4.0
      - filelock==3.13.1
      - flatbuffers==23.5.26
      - fonttools==4.47.0
      - frozenlist==1.4.1
      - fsspec==2023.10.0
      - gast==0.5.4
      - gitdb==4.0.11
      - gitpython==3.1.40
      - google-auth==2.28.1
      - google-auth-oauthlib==1.2.0
      - google-pasta==0.2.0
      - grpcio==1.62.0
      - h5py==3.10.0
      - huggingface-hub==0.20.3
      - idna==3.6
      - imageio==2.34.0
      - imageio-ffmpeg==0.4.9
      - importlib-metadata==7.0.0
      - jinja2==3.1.2
      - keras==2.15.0
      - kiwisolver==1.4.5
      - libclang==16.0.6
      - markdown==3.5.2
      - markupsafe==2.1.3
      - matplotlib==3.8.2
      - mediapipe==0.10.9
      - ml-dtypes==0.2.0
      - mpmath==1.3.0
      - multidict==6.0.4
      - multiprocess==0.70.15
      - networkx==3.2.1
      - numpy==1.26.2
      - nvidia-cublas-cu12==12.1.3.1
      - nvidia-cuda-cupti-cu12==12.1.105
      - nvidia-cuda-nvrtc-cu12==12.1.105
      - nvidia-cuda-runtime-cu12==12.1.105
      - nvidia-cudnn-cu12==8.9.2.26
      - nvidia-cufft-cu12==11.0.2.54
      - nvidia-curand-cu12==10.3.2.106
      - nvidia-cusolver-cu12==11.4.5.107
      - nvidia-cusparse-cu12==12.1.0.106
      - nvidia-nccl-cu12==2.18.1
      - nvidia-nvjitlink-cu12==12.3.101
      - nvidia-nvtx-cu12==12.1.105
      - oauthlib==3.2.2
      - opencv-contrib-python==4.8.1.78
      - opencv-python==4.8.1.78
      - opt-einsum==3.3.0
      - packaging==23.2
      - pandas==2.1.4
      - pillow==10.1.0
      - pose-format==0.2.3
      - pose-to-video==0.0.1
      - protobuf==3.20.3
      - psutil==5.9.7
      - pyarrow==14.0.2
      - pyarrow-hotfix==0.6
      - pyasn1==0.5.1
      - pyasn1-modules==0.3.0
      - pycparser==2.21
      - pyparsing==3.1.1
      - python-dateutil==2.8.2
      - pytz==2023.3.post1
      - pyyaml==6.0.1
      - regex==2023.10.3
      - requests==2.31.0
      - requests-oauthlib==1.3.1
      - rsa==4.9
      - safetensors==0.4.1
      - scipy==1.11.4
      - sentry-sdk==1.39.1
      - setproctitle==1.3.3
      - six==1.16.0
      - smmap==5.0.1
      - sounddevice==0.4.6
      - sympy==1.12
      - tensorboard==2.15.2
      - tensorboard-data-server==0.7.2
      - tensorflow==2.15.0.post1
      - tensorflow-estimator==2.15.0
      - tensorflow-io-gcs-filesystem==0.36.0
      - termcolor==2.4.0
      - tokenizers==0.15.0
      - torch==2.1.2
      - torchvision==0.16.2
      - tqdm==4.66.1
      - transformers==4.36.2
      - triton==2.1.0
      - typing-extensions==4.9.0
      - tzdata==2023.3
      - urllib3==2.1.0
      - wandb==0.16.1
      - werkzeug==3.0.1
      - wrapt==1.14.1
      - xformers==0.0.23.post1
      - xxhash==3.4.1
      - yarl==1.9.4
      - zipp==3.17.0
sparkkid1234 commented 3 months ago

Thanks @AmitMY let me check. Another quick question, is there a recommended way to run pose-to-video/data/BIU-MG/video_to_images.py for multiple videos so my custom dataset is larger than 1 single video? I'm assuming I can change the write mode for the zipfile to a instead of w like in the code and run the script separately for each video but pointing to the same output zips? Thanks!

AmitMY commented 3 months ago

I guess that could work. My recommended solution though would be to modify the code itself, to take a directory of videos and a directory of poses with the same names, then iterate them

sparkkid1234 commented 3 months ago

hey @AmitMY I've got the code working by downgrading datasets to your same version. Seems like there was a breaking change.

One last question on this issue. The download link for the BIU-MG dataset is no longer valid, so can I ask how long was the video used to train the controlnet model? Or, even better, how many frames in total? Thank you, I'll close this issue after

AmitMY commented 3 months ago

that's great! wanna add a change to the controlnet README or setup?

The original video was 30 minutes at 30fps, if i am not mistaken. It was recorded with a green screen, that was then keyed out and replaced with a single green color (there was lighting differences on the green screen, easier to key out than to learn)

sparkkid1234 commented 3 months ago

will do!