The cuda version I use is 11.1. What version of torch and torch vision should I use to reproduce

SushantGautam commented 1 year ago

Same

timmeinhardt commented 1 year ago

The installation readme mentions the required PyTorch and Torchvision versions. If you need to run different versions for newer CUDA versions you are on your own, i.e., you need to try and see if the code runs without errors. In particular, if the manually compiled code compiles without errors. In any case you need to try-and-error and see if it works.

tostenzel commented 1 year ago

Hi, as Tim has written in a previous issue (https://github.com/timmeinhardt/trackformer/issues/41), the installation readme is wrong. He mentioned "2. Install PyTorch 1.7 and torchvision 0.8 ". With PyTorch 1.7, you can use cuda 11.1 (see the mentioned table from the PyTorch link)!

There is a bug in torchvision 0.8. I am still working through everything but pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html works for me until now.

I will post again, if my attempt did not work out.

tostenzel commented 1 year ago

It worked for me, although I had to choose cuda 11.0 for my GPU. I guess you will find a way with PyTorch 1.7.0 if you try hard enough. My strategy was to get Trackformer's main dependency to work separately before turning to the complete Trackformer package.

Edit: Below is exactly what I did:

Install Python in e.g. home

wget https://repo.anaconda.com/miniconda/Miniconda3-py37_23.1.0-1-Linux-x86_64.sh
chmod 755
./Miniconda3-py37_23.1.0-1-Linux-x86_64.sh

Clone project

git clone https://github.com/timmeinhardt/trackformer
cd trackformer

Conda Environment with Python 3.7

My server has 8 NVIDIA RTX A6000 GPUs (see nvidea-smi). According to the NVIDIA recommendation, we have to use at least CUDA 11.1 (see CUDA wiki and NVIDIA forum).
The driver is NVIDIA UNIX x86_64 Kernel Module 525.89 (see cat /proc/driver/nvidia/version).
We create the conda environment with Python 3.7 via
- conda create --prefix=.env/conda-py3_7 python=3.7 pip
and activate it immediately with
- conda activate .env/conda-py3_7

PyTorch

For CUDA 11.1 (requirement from GPU), we have to choose at least PyTorch 1.8 (see PyTorch versions). However, the Trackformer program strongly recommends PyTorch 1.7 and torchvision 0.8 (see this Issue). Note that these two versions are unequal to what is incorrectly stated in Trackformer's Install.md. Because of Trackformer, I choose Pytorch 1.7.1 that has only CUDA 11.0 instead of 11.1 support (see PyTorch versions) and hope that CUDA works despite it.
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=11.0 -c pytorch

Unfortunatley, if we install the cudatoolkit via conda, the nvcc compiler does not come with it. Therefore, we need to install it on top manually to prevent CUDA from choosing the inappropropriate, locally pre-installed version in /usr/local/cuda. Unfavorably, there is no compiler for our CUDA 11.0 (that we have chosen for PyTorch 1.7.1). Therefore, I choose a higher version one more time:

conda install -c "nvidia/label/cuda-11.7.0" cuda-nvcc
conda install -c conda-forge cudatoolkit-dev

MultiScaleDeformableAttention

Trackformer's main dependency is MultiScaleDeformableAttention from the Deformable-DETR repository. That program is used for detecting images (not videos) as a more efficient DETR version. Installing this program is tricky. Thus, we do it before the other Trackformer requirements.

We first install pycocotools (with fixed ignore flag) according to the Trackformer installation guide. That dependency is also used by Deformable-DETR without the version specification.

pip3 install -U 'git+https://github.com/timmeinhardt/cocoapi.git#subdirectory=PythonAPI'

We continue with the Deformable-DETR requirements that I copied from the respective (repository)[https://github.com/fundamentalvision/Deformable-DETR/blob/main/requirements.txt]:

pip3 install -r requirements-deformable_detr.txt

Next, we install MultiScaleDeformableAttention from the local files in this repository with

python src/trackformer/models/ops/setup.py build --build-base=src/trackformer/models/ops/ install

Finally, we test whether the installation was succesful:

cd src/trackformer/models/ops
# unit test (should see all checking is True)
python test.py
cd ../../../..

Trackformer

At last, we install the other Trackformer requirements. Note that I have changed numpy to a more recent numpy version. Without that, we would get an DimensionMismatchError from running Trackformer's src/track.py.

pip3 install -r requirements-trackformer.txt

We test whether the Trackformer installation was successful in two ways:

Installation Validation

1. Evaluation

Either download the MOT17(https://motchallenge.net/data/MOT17/) dataset to the data folder via

cd data wget https://motchallenge.net/data/MOT17.zip jar xf MOT17.zip # unzip might yield possible zip bomb error python src/generate_coco_from_mot.py

Then, download and unpack the pretrained TrackFormer model files in the models directory:

cd models wget https://vision.in.tum.de/webshare/u/meinhard/trackformer_models_v1.zip jar xf trackformer_models_v1.zip # unzip might yield possible zip bomb error cd ..

Next, evaluate the pre-trained MOT17 models with MOT20 metrics via

python src/track.py

2. Training

Try to train Trackformer on the MOT17 dataset for some batches via

python src/train.py with \ mot17 \ deformable \ multi_frame \ tracking \ output_dir=models/mot17_deformable_multi_frame \

niangea commented 1 year ago

@tostenzel I followed your method to install the environment, but encountered a problem while installing the MultiScaleDeformableAttention package，in issue # 96

niangea commented 1 year ago

This is my current environment： PyTorch version: 1.7.0+cu110 Is debug build: True CUDA used to build PyTorch: 11.0 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.26.3

Python version: 3.7 (64-bit runtime) Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /opt/orion/lib/orion-cuda-11.0/libcudnn_orion.so.11.0.8 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.18.5 [pip3] torch==1.7.0+cu110 [pip3] torchfile==0.1.0 [pip3] torchvision==0.8.1+cu110 [conda] numpy 1.18.5 pypi_0 pypi [conda] torch 1.7.0+cu110 pypi_0 pypi [conda] torchfile 0.1.0 pypi_0 pypi [conda] torchvision 0.8.1+cu110 pypi_0 pypi

timmeinhardt commented 1 year ago

You still do not have the correct PyTorch version. So these errors can come up. I can not debug your system for other configurations than the one we suggested.

timmeinhardt / trackformer