openai / improved-diffusion

Release for Improved Denoising Diffusion Probabilistic Models
MIT License
3k stars 466 forks source link

Query: Distributed learning #24

Open KomputerMaster64 opened 1 year ago

KomputerMaster64 commented 1 year ago

I am trying to implement the improved-ddpm project using google colab (1 GPU), I am not sure how to rectify the distributed training problem coming because of the following code

Traceback (most recent call last):
  File "image_train.py", line 7, in <module>
    from improved_diffusion import dist_util, logger
  File "/content/gdrive/MyDrive/Colab Notebooks/GitHub Repositories/improved-diffusion/improved_diffusion/dist_util.py", line 10, in <module>
    from mpi4py import MPI
ModuleNotFoundError: No module named 'mpi4py'
taoisu commented 1 year ago

Just install mpi4py python package (pip install mpi4py), and you may also need to install libmpich-dev (apt install libmpich-dev) before that.

KomputerMaster64 commented 1 year ago

I am still facing issue. I will search more about it. The following is the output: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. requests 2.23.0 requires urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you have urllib3 1.26.11 which is incompatible. datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.

muhamadusman commented 1 year ago

@KomputerMaster64 you can try with "conda install -c conda-forge mpi4py" it solved the issue for me

JunMa11 commented 1 year ago

@muhamadusman May i know your mpi4py version?

I got the following error when training the model:

ImportError: libmpi.so.12: cannot open shared object file: No such file or directory

adahfbch commented 6 months ago

I got the following error when training the model:

ImportError: libmpi.so.12: cannot open shared object file: No such file or directory

@JunMa11 Hi, have you solved this problem? I just got the same error.