Closed Finebouche closed 1 year ago
I think the problem might be that you were using the older version of WarpDrive. Before version 2.0, the manager module has slightly different structure. I just tested the tutorial 1.a via Colab, it went through very well. The version I used is the latest 2.2.1, and I believe all versions above 2.0 should be able to run this tutorial.
Oh i see, indeed
pip install -U rl_warp_drive
installed the version 1.6.1 of warp_drive because of some pytorch dependency that wasn' met.
Fixing the warp_drive version to 2.2.1 now gives me
ERROR: Could not find a version that satisfies the requirement torch<1.11,>=1.9 (from rl-warp-drive) (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1) ERROR: No matching distribution found for torch<1.11,>=1.9
So I guess I need to downgrade my pytorch version to 1.11 ? Is that correct ?
So it seems that in order to install torch version 1.10.2 you also need Python<3.7. If this is corret, it should be put somewhere in the documentation I think.
I am not over with my troubles but it is heading somewhere !
I think you shall configure the CUDA environment first. Installing pytorch directly will lead to some issue due to the library compatibility issue, especially the driver of CUDA and its service suites. For example, Colab could run it directly since the backend CUDA env is configured correctly. So I suggest you try Nvidia released Docker image that will solve all the problem. Another question you asked: we use torch 1.10 is that 1.11 has a bug in training but torch 1.12 seems already resolved it. Anyway, we still stick with torch 1.10 but it does not require Python<3.7, I run Python 3.7.9 on my own environment.
An example installation
FROM nvcr.io/nvidia/pytorch:21.10-py3
LABEL description="warpdrive-env"
WORKDIR /home
RUN chmod a+rwx /home
RUN pip install pycuda==2022.1
RUN conda install numba==0.54.0
RUN pip install rl_warp_drive
So it seems that there is one last little inconsistency.
It seems that the https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_21-10.html#rel_21-10 container you recommended works with Python 3.8.
Otherwise, I think I have all the details to make it work, thanks !
Cool, I think Python version 3.7 or 3.8 is not critical
Hi !
Seems there is a problem in tutorials https://github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-1.a-warp_drive_basics.ipynb
The line
from warp_drive.managers.pycuda_managers.pycuda_data_manager import PyCUDADataManager
doesn't work anymore and gives :ModuleNotFoundError: No module named 'warp_drive.managers.pycuda_managers'