Thank you (the authors) for this very impressive work.
Below are a few notes that I've made while setting up the environment, in case if anyone ran into similar issues with the setup.
I am using the official pytorch docker pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel, note it comes with python 3.7, some packages (e.g., matplotlib) will complain with version mismatch, just remove the version specified in the requirement file, as the author states that they do not necessarily have to be bound to the specific versions).
A complete Dockerfile and requirement file is attached at the bottom. Below are some changes that I've made.
pydantic
Just noting that pip will default to pydantic==2.5.3 if not specified and it will cause error when import accelerate.
To resolve it please specify installing pydantic==1.9.0 (I haven't tested training yet but at least that will remedy the above issue with the import).
xformer error with RuntimeError: CUTLASS submodule not found. Did you forget to rungit submodule update --init --recursive?This will solve it pip install git+https://github.com/facebookresearch/xformers.git@v0.0.13#egg=xformers.
I found that 0.0.13 will run into a runetime error: CUDA error: an illegal memory access was encountered,
which traces to
Hi,
Thank you (the authors) for this very impressive work.
Below are a few notes that I've made while setting up the environment, in case if anyone ran into similar issues with the setup.
I am using the official pytorch docker
pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel
, note it comes withpython 3.7
, some packages (e.g.,matplotlib
) will complain with version mismatch, just remove the version specified in the requirement file, as the author states that they do not necessarily have to be bound to the specific versions).A complete Dockerfile and requirement file is attached at the bottom. Below are some changes that I've made.
pydantic
Just noting thatpip
will default topydantic==2.5.3
if not specified and it will cause error whenimport accelerate
. To resolve it please specify installingpydantic==1.9.0
(I haven't tested training yet but at least that will remedy the above issue with theimport
).mmcv
builds forever See this.xformer
error withRuntimeError: CUTLASS submodule not found. Did you forget to run
git submodule update --init --recursive?
This will solve itI found thatpip install git+https://github.com/facebookresearch/xformers.git@v0.0.13#egg=xformers
.0.0.13
will run into a runetime error:CUDA error: an illegal memory access was encountered
, which traces toInstead
xformers==0.0.12
works flawlessly.Also add
triton==2.1.0
to the requirement file.Reproduce with:
Dockerfile
modified requirement.txt