justinpinkney / stable-diffusion

MIT License
1.45k stars 266 forks source link

Something wrong during running finetuning on pokeman dataset. #90

Closed Learner209 closed 10 months ago

Learner209 commented 10 months ago

Hi! This is a great project here! Problem: Upon running finetuning process following your description and examples from the poekeman_finetune.ipynb ipynb script, I ran into the error from the "src/ldm/modules/attention.py", the full backtrace iis

Expected behaviour: The finetuning process started normally.

System configuration: Ubuntu 22.04

My conda enviroment:

name: ldm
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1
  - _openmp_mutex=5.1
  - blas=1.0
  - ca-certificates=2023.12.12
  - cudatoolkit=11.0.221
  - freetype=2.12.1
  - giflib=5.2.1
  - intel-openmp=2021.4.0
  - jpeg=9e
  - lcms2=2.12
  - ld_impl_linux-64=2.38
  - lerc=3.0
  - libdeflate=1.17
  - libffi=3.3
  - libgcc-ng=11.2.0
  - libgfortran-ng=11.2.0
  - libgfortran5=11.2.0
  - libgomp=11.2.0
  - libpng=1.6.39
  - libstdcxx-ng=11.2.0
  - libtiff=4.5.1
  - libuv=1.44.2
  - libwebp=1.3.2
  - libwebp-base=1.3.2
  - lz4-c=1.9.4
  - mkl=2021.4.0
  - mkl-service=2.4.0
  - mkl_fft=1.3.1
  - mkl_random=1.2.2
  - ncurses=6.4
  - ninja=1.10.2
  - ninja-base=1.10.2
  - numpy=1.19.2
  - numpy-base=1.19.2
  - openjpeg=2.4.0
  - openssl=1.1.1w
  - pillow=10.0.1
  - pip=20.3.3
  - python=3.8.5
  - readline=8.2
  - setuptools=68.2.2
  - six=1.16.0
  - sqlite=3.41.2
  - tk=8.6.12
  - wheel=0.41.2
  - xz=5.4.5
  - zlib=1.2.13
  - zstd=1.5.5
  - pip:
    - absl-py==2.0.0
    - aiohttp==3.9.1
    - aiosignal==1.3.1
    - albumentations==0.4.3
    - altair==5.2.0
    - analytics-python==1.4.post1
    - annotated-types==0.6.0
    - antlr4-python3-runtime==4.8
    - async-timeout==4.0.3
    - backoff==1.10.0
    - backports-zoneinfo==0.2.1
    - bcrypt==4.1.2
    - blinker==1.7.0
    - braceexpand==0.1.7
    - cachetools==5.3.2
    - certifi==2023.11.17
    - click==8.1.7
    - cryptography==41.0.7
    - datasets==2.4.0
    - diffusers==0.3.0
    - dill==0.3.5.1
    - einops==0.3.0
    - fastapi==0.105.0
    - ffmpy==0.3.1
    - filelock==3.13.1
    - fire==0.4.0
    - frozenlist==1.4.1
    - fsspec==2023.12.2
    - ftfy==6.1.3
    - future==0.18.3
    - gitdb==4.0.11
    - gitpython==3.1.40
    - google-auth==2.25.2
    - google-auth-oauthlib==1.0.0
    - gradio==3.1.4
    - grpcio==1.60.0
    - h11==0.12.0
    - httpcore==0.15.0
    - httpx==0.25.1
    - huggingface-hub==0.19.4
    - idna==3.6
    - imageio==2.9.0
    - imageio-ffmpeg==0.4.2
    - imgaug==0.2.6
    - importlib-metadata==6.11.0
    - importlib-resources==6.1.1
    - jsonschema==4.20.0
    - jsonschema-specifications==2023.11.2
    - kornia==0.6.0
    - lightning-utilities==0.10.0
    - linkify-it-py==2.0.2
    - markdown==3.5.1
    - markdown-it-py==3.0.0
    - matplotlib==3.7.4
    - mdit-py-plugins==0.4.0
    - mdurl==0.1.2
    - monotonic==1.6
    - multidict==6.0.4
    - multiprocess==0.70.13
    - nvidia-cublas-cu12==12.1.3.1
    - nvidia-cuda-cupti-cu12==12.1.105
    - nvidia-cuda-nvrtc-cu12==12.1.105
    - nvidia-cuda-runtime-cu12==12.1.105
    - nvidia-cudnn-cu12==8.9.2.26
    - nvidia-cufft-cu12==11.0.2.54
    - nvidia-curand-cu12==10.3.2.106
    - nvidia-cusolver-cu12==11.4.5.107
    - nvidia-cusparse-cu12==12.1.0.106
    - nvidia-nccl-cu12==2.18.1
    - nvidia-nvjitlink-cu12==12.3.101
    - nvidia-nvtx-cu12==12.1.105
    - oauthlib==3.2.2
    - omegaconf==2.1.1
    - open-clip-torch==1.2.1
    - opencv-python==4.5.5.64
    - orjson==3.9.10
    - packaging==23.2
    - paramiko==3.4.0
    - protobuf==4.25.1
    - pudb==2019.2
    - pyarrow==14.0.2
    - pyasn1==0.5.1
    - pycryptodome==3.19.0
    - pydantic==2.5.2
    - pydantic-core==2.14.5
    - pydeck==0.8.1b0
    - pydeprecate==0.3.1
    - pydub==0.25.1
    - pygments==2.17.2
    - pynacl==1.5.0
    - python-dateutil==2.8.2
    - python-multipart==0.0.6
    - pytorch-lightning==1.4.2
    - pytz==2023.3.post1
    - referencing==0.32.0
    - regex==2023.10.3
    - responses==0.18.0
    - rich==13.7.0
    - rpds-py==0.15.2
    - sacremoses==0.1.1
    - scikit-image==0.20.0
    - scipy==1.9.1
    - smmap==5.0.1
    - starlette==0.27.0
    - streamlit==1.29.0
    - tenacity==8.2.3
    - tensorboard==2.14.0
    - tensorboard-data-server==0.7.2
    - test-tube==0.7.5
    - tokenizers==0.12.1
    - toml==0.10.2
    - toolz==0.12.0
    - torch==1.12.1+cu113
    - torch-fidelity==0.3.0
    - torchmetrics==0.6.0
    - torchvision==0.13.1+cu113
    - tornado==6.4
    - tqdm==4.66.1
    - transformers==4.22.2
    - triton==2.1.0
    - typing-extensions==4.9.0
    - tzlocal==5.2
    - uc-micro-py==1.0.2
    - urllib3==2.1.0
    - urwid==2.3.4
    - uvicorn==0.24.0.post1
    - validators==0.22.0
    - watchdog==3.0.0
    - wcwidth==0.2.12
    - webdataset==0.2.5
    - werkzeug==3.0.1
    - xxhash==3.4.1
    - yarl==1.9.4
    - zipp==3.17.0

Anyone could help? I'll really appreciate that ! 😊🥰

Learner209 commented 10 months ago

It turns out that it is due to the problem of the size of the textual prompts aren't multiples of the batch size. The dup of this issue. Closing this issue.