Closed niangea closed 1 year ago
The installation readme mentions the required PyTorch and Torchvision versions. If you need to run different versions for newer CUDA versions you are on your own, i.e., you need to try and see if the code runs without errors. In particular, if the manually compiled code compiles without errors. In any case you need to try-and-error and see if it works.
Hi, as Tim has written in a previous issue (https://github.com/timmeinhardt/trackformer/issues/41), the installation readme is wrong. He mentioned "2. Install PyTorch 1.7 and torchvision 0.8 ". With PyTorch 1.7, you can use cuda 11.1 (see the mentioned table from the PyTorch link)!
There is a bug in torchvision 0.8. I am still working through everything but pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
works for me until now.
I will post again, if my attempt did not work out.
It worked for me, although I had to choose cuda 11.0 for my GPU. I guess you will find a way with PyTorch 1.7.0 if you try hard enough. My strategy was to get Trackformer's main dependency to work separately before turning to the complete Trackformer package.
Edit: Below is exactly what I did:
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_23.1.0-1-Linux-x86_64.sh
chmod 755
./Miniconda3-py37_23.1.0-1-Linux-x86_64.sh
git clone https://github.com/timmeinhardt/trackformer
cd trackformer
My server has 8 NVIDIA RTX A6000 GPUs (see nvidea-smi
). According to the NVIDIA recommendation, we have to use at least CUDA 11.1 (see CUDA wiki and NVIDIA forum).
The driver is NVIDIA UNIX x86_64 Kernel Module 525.89 (see cat /proc/driver/nvidia/version
).
We create the conda environment with Python 3.7 via
conda create --prefix=.env/conda-py3_7 python=3.7 pip
and activate it immediately with
conda activate .env/conda-py3_7
For CUDA 11.1 (requirement from GPU), we have to choose at least PyTorch 1.8 (see PyTorch versions). However, the Trackformer program strongly recommends PyTorch 1.7 and torchvision 0.8 (see this Issue). Note that these two versions are unequal to what is incorrectly stated in Trackformer's Install.md. Because of Trackformer, I choose Pytorch 1.7.1 that has only CUDA 11.0 instead of 11.1 support (see PyTorch versions) and hope that CUDA works despite it.
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=11.0 -c pytorch
Unfortunatley, if we install the cudatoolkit via conda, the nvcc compiler
does not come with it. Therefore, we need to install it on top manually to prevent CUDA from choosing the inappropropriate, locally pre-installed version in /usr/local/cuda
. Unfavorably, there is no compiler for our CUDA 11.0 (that we have chosen for PyTorch 1.7.1). Therefore, I choose a higher version one more time:
conda install -c "nvidia/label/cuda-11.7.0" cuda-nvcc
conda install -c conda-forge cudatoolkit-dev
Trackformer's main dependency is MultiScaleDeformableAttention from the Deformable-DETR repository. That program is used for detecting images (not videos) as a more efficient DETR version. Installing this program is tricky. Thus, we do it before the other Trackformer requirements.
We first install pycocotools (with fixed ignore flag) according to the Trackformer installation guide. That dependency is also used by Deformable-DETR without the version specification.
pip3 install -U 'git+https://github.com/timmeinhardt/cocoapi.git#subdirectory=PythonAPI'
We continue with the Deformable-DETR requirements that I copied from the respective (repository)[https://github.com/fundamentalvision/Deformable-DETR/blob/main/requirements.txt]:
pip3 install -r requirements-deformable_detr.txt
Next, we install MultiScaleDeformableAttention from the local files in this repository with
python src/trackformer/models/ops/setup.py build --build-base=src/trackformer/models/ops/ install
Finally, we test whether the installation was succesful:
cd src/trackformer/models/ops
# unit test (should see all checking is True)
python test.py
cd ../../../..
At last, we install the other Trackformer requirements. Note that I have changed numpy to a more recent numpy version. Without that, we would get an DimensionMismatchError from running Trackformer's src/track.py
.
pip3 install -r requirements-trackformer.txt
We test whether the Trackformer installation was successful in two ways:
Either download the MOT17(https://motchallenge.net/data/MOT17/) dataset to the data folder via
Then, download and unpack the pretrained TrackFormer model files in the models directory:
Next, evaluate the pre-trained MOT17 models with MOT20 metrics via
python src/track.py
Try to train Trackformer on the MOT17 dataset for some batches via
@tostenzel I followed your method to install the environment, but encountered a problem while installing the MultiScaleDeformableAttention package,in issue # 96
This is my current environment: PyTorch version: 1.7.0+cu110 Is debug build: True CUDA used to build PyTorch: 11.0 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.26.3
Python version: 3.7 (64-bit runtime) Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /opt/orion/lib/orion-cuda-11.0/libcudnn_orion.so.11.0.8 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.18.5 [pip3] torch==1.7.0+cu110 [pip3] torchfile==0.1.0 [pip3] torchvision==0.8.1+cu110 [conda] numpy 1.18.5 pypi_0 pypi [conda] torch 1.7.0+cu110 pypi_0 pypi [conda] torchfile 0.1.0 pypi_0 pypi [conda] torchvision 0.8.1+cu110 pypi_0 pypi
You still do not have the correct PyTorch version. So these errors can come up. I can not debug your system for other configurations than the one we suggested.
Same