Open FurkanGozukara opened 10 months ago
This error seems to be same as this issue: https://github.com/NVIDIA/NeMo/issues/5485
Please verify GPU version of PyTorch is installed.
This error seems to be same as this issue: NVIDIA/NeMo#5485
Please verify GPU version of PyTorch is installed.
thank you. single GPU training working. I think it is accurately installed
here the venv
Microsoft Windows [Version 10.0.22631.2861]
(c) Microsoft Corporation. All rights reserved.
C:\Users\user\kohya_ss\venv\Scripts>activate
(venv) C:\Users\user\kohya_ss\venv\Scripts>pip freeze
absl-py==2.0.0
accelerate==0.23.0
aiofiles==23.2.1
aiohttp==3.9.1
aiosignal==1.3.1
altair==4.2.2
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
anyio==4.2.0
appdirs==1.4.4
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.2.0
bitsandbytes==0.41.1
cachetools==5.3.2
certifi==2022.12.7
charset-normalizer==2.1.1
click==8.1.7
colorama==0.4.6
coloredlogs==15.0.1
contourpy==1.2.0
cycler==0.12.1
dadaptation==3.1
diffusers==0.24.0
docker-pycreds==0.4.0
easygui==0.98.3
einops==0.6.0
entrypoints==0.4
exceptiongroup==1.2.0
fairscale==0.4.13
fastapi==0.109.0
ffmpy==0.3.1
filelock==3.9.0
flatbuffers==23.5.26
fonttools==4.47.2
frozenlist==1.4.1
fsspec==2023.12.2
ftfy==6.1.1
gast==0.5.4
gitdb==4.0.11
GitPython==3.1.41
google-auth==2.26.2
google-auth-oauthlib==1.0.0
google-pasta==0.2.0
gradio==3.36.1
gradio_client==0.8.0
grpcio==1.60.0
h11==0.14.0
h5py==3.10.0
httpcore==1.0.2
httpx==0.26.0
huggingface-hub==0.19.4
humanfriendly==10.0
idna==3.4
importlib-metadata==7.0.1
invisible-watermark==0.2.0
Jinja2==3.1.2
jsonschema==4.20.0
jsonschema-specifications==2023.12.1
keras==2.14.0
kiwisolver==1.4.5
libclang==16.0.6
-e git+https://github.com/bmaltais/kohya_ss.git@842d9c7018288d5c3d6e01adc0d7f886b70252b6#egg=library
lightning-utilities==0.10.0
linkify-it-py==2.0.2
lion-pytorch==0.0.6
lycoris-lora==2.0.2
Markdown==3.5.2
markdown-it-py==2.2.0
MarkupSafe==2.1.3
matplotlib==3.8.2
mdit-py-plugins==0.3.3
mdurl==0.1.2
ml-dtypes==0.2.0
mpmath==1.3.0
multidict==6.0.4
networkx==3.0
numpy==1.24.1
oauthlib==3.2.2
omegaconf==2.3.0
onnx==1.14.1
onnxruntime-gpu==1.16.0
open-clip-torch==2.20.0
opencv-python==4.7.0.68
opt-einsum==3.3.0
orjson==3.9.10
packaging==23.2
pandas==2.1.4
pathtools==0.1.2
Pillow==9.3.0
prodigyopt==1.0
protobuf==3.20.3
psutil==5.9.7
pyasn1==0.5.1
pyasn1-modules==0.3.0
pydantic==2.5.3
pydantic_core==2.14.6
pydub==0.25.1
Pygments==2.17.2
pyparsing==3.1.1
pyreadline3==3.4.1
python-dateutil==2.8.2
python-multipart==0.0.6
pytorch-lightning==1.9.0
pytz==2023.3.post1
PyWavelets==1.5.0
PyYAML==6.0.1
referencing==0.32.1
regex==2023.12.25
requests==2.28.1
requests-oauthlib==1.3.1
rich==13.4.1
rpds-py==0.17.1
rsa==4.9
safetensors==0.3.1
scipy==1.11.4
semantic-version==2.10.0
sentencepiece==0.1.99
sentry-sdk==1.39.2
setproctitle==1.3.3
six==1.16.0
smmap==5.0.1
sniffio==1.3.0
starlette==0.35.1
sympy==1.12
tensorboard==2.14.1
tensorboard-data-server==0.7.2
tensorflow==2.14.0
tensorflow-estimator==2.14.0
tensorflow-intel==2.14.0
tensorflow-io-gcs-filesystem==0.31.0
termcolor==2.4.0
timm==0.6.12
tk==0.1.0
tokenizers==0.13.3
toml==0.10.2
toolz==0.12.0
torch==2.0.1+cu118
torchmetrics==1.3.0
torchvision==0.15.2+cu118
tqdm==4.66.1
transformers==4.30.2
typing_extensions==4.9.0
tzdata==2023.4
uc-micro-py==1.0.2
urllib3==1.26.13
uvicorn==0.25.0
voluptuous==0.13.1
wandb==0.15.11
wcwidth==0.2.13
websockets==11.0.3
Werkzeug==3.0.1
wrapt==1.14.1
xformers==0.0.21
yarl==1.9.4
zipp==3.17.0
(venv) C:\Users\user\kohya_ss\venv\Scripts>
The user seems to modify the script to use gloo
on Windows. According to this issue https://github.com/huggingface/accelerate/issues/141, it seems to be required to initialize torch.distributed
. Unfortunately I don't know how to initilize it or use gloo
on Windows...
The user seems to modify the script to use
gloo
on Windows. According to this issue huggingface/accelerate#141, it seems to be required to initializetorch.distributed
. Unfortunately I don't know how to initilize it or usegloo
on Windows...
thanks looks like linux is mandatory atm
I have a subscriber who has dual RTX 4060 Ti - 16 GB
He is on Windows 10 and Python 3.10.9 - fresh install
When we set the huggingface default_config.yaml like below
train util.py like below
We are getting the below error. How can we fix it?