votchallenge / toolkit

The official VOT Challenge evaluation and analysis toolkit
http://www.votchallenge.net/
GNU General Public License v3.0
153 stars 43 forks source link

About singularity container running vot evaluation command #56

Closed davidyang180 closed 2 years ago

davidyang180 commented 2 years ago

Hi! When I run Python code on the command line inside the singularity container, CUDA showing is available;

Singularity> python3
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch._C._cuda_getDeviceCount()
8
>>> print(torch.cuda.is_available())
True
>>>

But when I run the code through the the "trackers.ini“ file,

[parking_point_adaptive_motion_baseline_swinb_motion] 
label = APMT
protocol = traxpython
command =import test_toolkit.run as run; run.run_code()

it will cause torch._C._cuda_getDeviceCount() = 0torch.cuda.is_available() = false, what is the reason?

/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

I asked related questions in the official repository of singularity, the author said "If it works from the command line, then Singularity is not causing the problem here. You will need to find the issue in the way that the tool is calling Python. Perhaps it is overriding the LD_LIBRARY_PATH that Singularity sets when using --nv to bind the CUDA drivers." Is this a problem caused by the vot-toolkit calling internal parameters?

botaoye commented 2 years ago

@davidyang180 Hi, I have the same problem, do you have any solution?

davidyang180 commented 2 years ago

@davidyang180 Hi, I have the same problem, do you have any solution?

@botaoye Hi, Let me think about it, I remember in a trax.py file code in the VOT official tool package, it will cover the ‘LD_LIBRARY_PATH' path in the container, just enter this path:'/usr/local/lib/python3.6/dist-packages/vot/tracker/trax.py ', modify the code in this file to use the default environment 'LD_LIBRARY_PATH' path::

class TrackerProcess(object):

    def __init__(self, command: str, envvars=dict(), timeout=30, log=False, socket=False):
        environment = dict(os.environ)
        LD_LIBRARY_PATH = environment['LD_LIBRARY_PATH']
        environment.update(envvars)
        environment['LD_LIBRARY_PATH'] = LD_LIBRARY_PATH
        self._workdir = tempfile.mkdtemp()
        ....................
botaoye commented 2 years ago

@davidyang180 Hi, I have the same problem, do you have any solution?

@botaoye Hi, Let me think about it, I remember in a trax.py file code in the VOT official tool package, it will cover the ‘LD_LIBRARY_PATH' path in the container, just enter this path:'/usr/local/lib/python3.6/dist-packages/vot/tracker/trax.py ', modify the code in this file to use the default environment 'LD_LIBRARY_PATH' path::

class TrackerProcess(object):

    def __init__(self, command: str, envvars=dict(), timeout=30, log=False, socket=False):
        environment = dict(os.environ)
        LD_LIBRARY_PATH = environment['LD_LIBRARY_PATH']
        environment.update(envvars)
        environment['LD_LIBRARY_PATH'] = LD_LIBRARY_PATH
        self._workdir = tempfile.mkdtemp()
        ....................

It works! Thanks a lot for your help.