Closed sparkkid1234 closed 3 months ago
I would try to load the dataset with a huggingface dataset loader, and observe it, to see what is going on. Since the dataset should have these columns, and not the ones it says for you: https://github.com/sign-language-processing/pose-to-video/blob/main/pose_to_video/conditional/controlnet/dataset.py#L50-L54
@AmitMY would you be able to share your datasets
module version and what are the expected output files in HF_DATASET_DIR
after I run the dataset.py
script? Thanks!
The directory I processed was in scratch storage, and so it was removed a while back.
The environment I used:
name: diffusers
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2023.12.12=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.12=h7f8727e_0
- pip=23.3.1=py311h06a4308_0
- python=3.11.5=h955ad1f_0
- readline=8.2=h5eee18b_0
- setuptools=68.2.2=py311h06a4308_0
- sqlite=3.41.2=h5eee18b_0
- tk=8.6.12=h1ccaba5_0
- wheel=0.41.2=py311h06a4308_0
- xz=5.4.5=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- pip:
- absl-py==2.0.0
- accelerate==0.25.0
- aiohttp==3.9.1
- aiosignal==1.3.1
- appdirs==1.4.4
- argparse==1.4.0
- astunparse==1.6.3
- attrs==23.1.0
- cachetools==5.3.3
- certifi==2023.11.17
- cffi==1.16.0
- charset-normalizer==3.3.2
- click==8.1.7
- contourpy==1.2.0
- cycler==0.12.1
- datasets==2.15.0
- diffusers==0.26.3
- dill==0.3.7
- docker-pycreds==0.4.0
- filelock==3.13.1
- flatbuffers==23.5.26
- fonttools==4.47.0
- frozenlist==1.4.1
- fsspec==2023.10.0
- gast==0.5.4
- gitdb==4.0.11
- gitpython==3.1.40
- google-auth==2.28.1
- google-auth-oauthlib==1.2.0
- google-pasta==0.2.0
- grpcio==1.62.0
- h5py==3.10.0
- huggingface-hub==0.20.3
- idna==3.6
- imageio==2.34.0
- imageio-ffmpeg==0.4.9
- importlib-metadata==7.0.0
- jinja2==3.1.2
- keras==2.15.0
- kiwisolver==1.4.5
- libclang==16.0.6
- markdown==3.5.2
- markupsafe==2.1.3
- matplotlib==3.8.2
- mediapipe==0.10.9
- ml-dtypes==0.2.0
- mpmath==1.3.0
- multidict==6.0.4
- multiprocess==0.70.15
- networkx==3.2.1
- numpy==1.26.2
- nvidia-cublas-cu12==12.1.3.1
- nvidia-cuda-cupti-cu12==12.1.105
- nvidia-cuda-nvrtc-cu12==12.1.105
- nvidia-cuda-runtime-cu12==12.1.105
- nvidia-cudnn-cu12==8.9.2.26
- nvidia-cufft-cu12==11.0.2.54
- nvidia-curand-cu12==10.3.2.106
- nvidia-cusolver-cu12==11.4.5.107
- nvidia-cusparse-cu12==12.1.0.106
- nvidia-nccl-cu12==2.18.1
- nvidia-nvjitlink-cu12==12.3.101
- nvidia-nvtx-cu12==12.1.105
- oauthlib==3.2.2
- opencv-contrib-python==4.8.1.78
- opencv-python==4.8.1.78
- opt-einsum==3.3.0
- packaging==23.2
- pandas==2.1.4
- pillow==10.1.0
- pose-format==0.2.3
- pose-to-video==0.0.1
- protobuf==3.20.3
- psutil==5.9.7
- pyarrow==14.0.2
- pyarrow-hotfix==0.6
- pyasn1==0.5.1
- pyasn1-modules==0.3.0
- pycparser==2.21
- pyparsing==3.1.1
- python-dateutil==2.8.2
- pytz==2023.3.post1
- pyyaml==6.0.1
- regex==2023.10.3
- requests==2.31.0
- requests-oauthlib==1.3.1
- rsa==4.9
- safetensors==0.4.1
- scipy==1.11.4
- sentry-sdk==1.39.1
- setproctitle==1.3.3
- six==1.16.0
- smmap==5.0.1
- sounddevice==0.4.6
- sympy==1.12
- tensorboard==2.15.2
- tensorboard-data-server==0.7.2
- tensorflow==2.15.0.post1
- tensorflow-estimator==2.15.0
- tensorflow-io-gcs-filesystem==0.36.0
- termcolor==2.4.0
- tokenizers==0.15.0
- torch==2.1.2
- torchvision==0.16.2
- tqdm==4.66.1
- transformers==4.36.2
- triton==2.1.0
- typing-extensions==4.9.0
- tzdata==2023.3
- urllib3==2.1.0
- wandb==0.16.1
- werkzeug==3.0.1
- wrapt==1.14.1
- xformers==0.0.23.post1
- xxhash==3.4.1
- yarl==1.9.4
- zipp==3.17.0
Thanks @AmitMY let me check. Another quick question, is there a recommended way to run pose-to-video/data/BIU-MG/video_to_images.py
for multiple videos so my custom dataset is larger than 1 single video? I'm assuming I can change the write mode for the zipfile to a
instead of w
like in the code and run the script separately for each video but pointing to the same output zips? Thanks!
I guess that could work. My recommended solution though would be to modify the code itself, to take a directory of videos and a directory of poses with the same names, then iterate them
hey @AmitMY I've got the code working by downgrading datasets to your same version. Seems like there was a breaking change.
One last question on this issue. The download link for the BIU-MG dataset is no longer valid, so can I ask how long was the video used to train the controlnet model? Or, even better, how many frames in total? Thank you, I'll close this issue after
that's great! wanna add a change to the controlnet README or setup?
The original video was 30 minutes at 30fps, if i am not mistaken. It was recorded with a green screen, that was then keyed out and replaced with a single green color (there was lighting differences on the green screen, easier to key out than to learn)
will do!
Hi @AmitMY , I'm getting errors when trying to run the
train_controlnet.py
script following the steps from thetrain.sh
file.I'm at a point where I have the
frames512.zip
andposes512.zip
files for my custom dataset (which is just a single video and corresponding.pose
file for now). I then rundataset.py
script in thecontrolnet
folder to prepare these into huggingface dataset and the script wrote all necessary files toHF_DATASET_DIR
. However, when I try running! accelerate launch diffusers/examples/controlnet/train_controlnet.py ...
with--training_data_dir="$HF_DATASET_DIR"
, I get the following errors:Would you be able to advise? Thanks!