Closed AI-Guru closed 2 years ago
Hi Tristan!
Unfortunately I'm unable to replicate this problem. Do you remember if you installed diffusers
using pip, or if you downloaded a specific version from github? If you don't and wouldn't mind sharing the output from pip freeze
, that would be helpful too. Also, can you try to launch training without accelerate
, just to see if it's a factor? You need to use the same command you pasted above, only using python instead of accelerate launch
:
python train_unconditional.py \
--dataset_name="huggan/flowers-102-categories" \
--resolution=64 \
--output_dir="ddpm-ema-flowers-64" \
--train_batch_size=16 \
--num_epochs=100 \
--gradient_accumulation_steps=1 \
--learning_rate=1e-4 \
--lr_warmup_steps=500 \
--mixed_precision=no
Thanks! Thanks for your time!
Looking at the error message, I get the impression that it could be a cuda vs cpu issue.
I ran the script with python. The error is the same.
I deinstalled diffusers and reinstalled it like this:
pip install diffusers[training]
This did not help.
Here is the pip freeze:
absl-py==1.2.0
accelerate==0.12.0
aiohttp==3.8.1
aiosignal==1.2.0
appdirs==1.4.4
apt-clone==0.2.1
apturl==0.5.2
asttokens==2.0.8
astunparse==1.6.3
async-timeout==4.0.2
attrs==22.1.0
audioread==3.0.0
Automat==20.2.0
Babel==2.10.3
backcall==0.2.0
blinker==1.4
bokeh==2.4.3
Brlapi==0.7.0
cachetools==5.2.0
certifi==2019.11.28
cffi==1.15.1
chardet==3.0.4
charset-normalizer==2.1.1
click==8.1.3
colorama==0.4.3
command-not-found==0.3
constantly==15.1.0
cryptography==2.8
cssselect==1.1.0
cupshelpers==1.0
cycler==0.11.0
datasets==2.4.0
dbus-python==1.2.16
decorator==5.1.1
deepspeed==0.7.0
defer==1.0.6
diffusers==0.2.2
dill==0.3.5.1
distro==1.4.0
distro-info===0.23ubuntu1
dnspython==2.2.1
email-validator==1.2.1
entrypoints==0.3
etils==0.7.1
executing==0.10.0
filelock==3.8.0
Flask==2.2.2
Flask-BabelEx==0.9.4
Flask-Login==0.6.2
Flask-Mail==0.9.1
Flask-Principal==0.4.0
Flask-Security==3.0.0
Flask-WTF==1.0.1
flatbuffers==1.12
fonttools==4.36.0
frozenlist==1.3.1
fsspec==2022.7.1
gast==0.4.0
google-auth==2.10.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
googleapis-common-protos==1.56.4
grpcio==1.47.0
h5py==3.7.0
hjson==3.1.0
httplib2==0.14.0
huggingface-hub==0.8.1
hyperlink==21.0.0
idna==2.8
importlib-metadata==4.12.0
importlib-resources==5.9.0
incremental==21.3.0
intervaltree==3.1.0
ipython==8.4.0
itemadapter==0.7.0
itemloaders==1.0.4
itsdangerous==2.1.2
jedi==0.18.1
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.1.0
keras==2.9.0
Keras-Preprocessing==1.1.2
keyring==18.0.1
kiwisolver==1.4.4
language-selector==0.1
launchpadlib==1.10.13
lazr.restfulclient==0.14.2
lazr.uri==1.0.3
libclang==14.0.6
librosa==0.9.2
llvmlite==0.39.0
louis==3.12.0
lxml==4.9.1
macaroonbakery==1.3.1
Markdown==3.4.1
MarkupSafe==2.1.1
matplotlib==3.5.3
matplotlib-inline==0.1.6
mido==1.2.10
modelcards==0.1.6
multidict==6.0.2
multiprocess==0.70.13
netifaces==0.10.4
ninja==1.10.2.3
note-seq==0.0.3
numba==0.56.0
numpy==1.20.3
oauthlib==3.1.0
olefile==0.46
opt-einsum==3.3.0
packaging==21.3
PAM==0.4.2
pandas==1.4.3
parsel==1.6.0
parso==0.8.3
passlib==1.7.4
pexpect==4.6.0
pickleshare==0.7.5
Pillow==9.2.0
pooch==1.6.0
pretty-midi==0.2.9
promise==2.3
prompt-toolkit==3.0.30
Protego==0.2.1
protobuf==3.19.4
psutil==5.9.1
pure-eval==0.2.2
py-cpuinfo==8.0.0
pyarrow==9.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycairo==1.16.2
pycparser==2.21
pycups==1.9.73
pydantic==1.9.2
PyDispatcher==2.0.5
pydub==0.25.1
pyFluidSynth==1.3.1
Pygments==2.13.0
PyGObject==3.36.0
PyICU==2.4.2
PyJWT==1.7.1
pymacaroons==0.13.0
PyNaCl==1.3.0
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyRFC3339==1.1
python-apt==2.0.0+ubuntu0.20.4.7
python-dateutil==2.8.2
python-debian===0.1.36ubuntu1
pytz==2022.2.1
pyxdg==0.26
PyYAML==5.3.1
queuelib==1.6.2
regex==2022.8.17
reportlab==3.5.34
requests==2.22.0
requests-file==1.5.1
requests-oauthlib==1.3.1
requests-unixsocket==0.2.0
resampy==0.4.0
responses==0.18.0
rsa==4.9
scikit-learn==1.1.2
scipy==1.9.0
Scrapy==2.6.2
screen-resolution-extra==0.0.0
SecretStorage==2.3.1
service-identity==21.1.0
simplejson==3.16.0
six==1.14.0
sortedcontainers==2.4.0
SoundFile==0.10.3.post1
speaklater==1.3
ssh-import-id==5.10
stack-data==0.4.0
svgwrite==1.4.3
systemd-python==234
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.9.1
tensorflow-datasets==4.6.0
tensorflow-estimator==2.9.0
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-metadata==1.9.0
termcolor==1.1.0
threadpoolctl==3.1.0
tldextract==3.3.1
tokenizers==0.12.1
toml==0.10.2
torch==1.13.0.dev20220819+cu116
torchaudio==0.12.1
torchvision==0.13.1
tornado==6.2
tqdm==4.64.0
traitlets==5.3.0
transformers==4.21.1
Twisted==22.4.0
typing-extensions==4.3.0
ubuntu-advantage-tools==27.9
ubuntu-drivers-common==0.0.0
ufw==0.36
unattended-upgrades==0.1
urllib3==1.26.11
w3lib==2.0.1
wadllib==1.3.3
wcwidth==0.2.5
Werkzeug==2.2.2
wrapt==1.14.1
WTForms==3.0.1
xkit==0.0.0
xxhash==3.0.0
yarl==1.8.1
zipp==3.8.1
zope.interface==5.4.0
Hi Tristan,
It looks like you are using a nightly version of PyTorch, instead of a release one. I installed one in my system and got the same error as you. However, uninstalling it and reinstalling the stable version worked fine for me. Unless you need some new features of the nightly version, I recommend you do the same.
You can get the install command for your system from https://pytorch.org/get-started/locally/. For reference, this is the one I used in my virtual environment:
pip install torch --extra-index-url https://download.pytorch.org/whl/cu116
Thanks a lot! That really did the trick! You are the best!
Describe the bug
Hi,
unfortunately, I could not run the training due to an error in the scheduler.
Below you will find the error log.
Best, Tristan
Reproduction
Logs