google / trax

Trax — Deep Learning with Clear Code and Speed
Apache License 2.0
8.05k stars 814 forks source link

Can't find ptxas binary in ${CUDA_DIR}/bin. #249

Closed 2020zyc closed 4 years ago

2020zyc commented 4 years ago

Description

Try to run the reformer model with the configuration reformer_enwik8.gin. Get an error: Can't find ptxas binary in ${CUDA_DIR}/bin. ...

Environment information

OS: Ubuntu 18.04.3 LTS

$ pip freeze | grep tensor
mesh-tensorflow==0.1.7
tensor2tensor==1.15.4
tensorboard==1.15.0
tensorflow-datasets==1.3.2
tensorflow-estimator==1.15.1
tensorflow-gan==2.0.0
tensorflow-gpu==1.15.0
tensorflow-hub==0.7.0
tensorflow-metadata==0.15.2
tensorflow-probability==0.7.0
tensorrt==6.0.1.4

$ pip freeze | grep jax
jax==0.1.57
jaxlib==0.1.37

$ python -V
python 3.6.8

$ nvcc --version 
cuda10.0 (/usr/local/cuda --> /usr/local/cuda-10.0, but /usr/local/cuda-10.1 exists)

GPU: 2080TI * 4

For bugs: reproduction and error logs

# Steps to reproduce:
Just run the trainer.py in trax/trax using the configuration reformer_enwiki8.gin.
# Error logs:
[[[!!!! I remove some normal info about dataset]]]
I0119 09:32:55.178084 140128464549696 problem.py:651] Reading data files from /root/tensorflow_datasets/t2t_enwik8_l65k/enwik8_l65k-dev*
INFO:tensorflow:partition: 0 num_data_files: 1
I0119 09:32:55.179685 140128464549696 problem.py:677] partition: 0 num_data_files: 1
I0119 09:32:56.124050 140128464549696 inputs.py:443] Heuristically setting bucketing to False based on shapes of target tensors.
I0119 09:32:56.131589 140128464549696 inputs.py:443] Heuristically setting bucketing to False based on shapes of target tensors.
I0119 09:32:56.136316 140128464549696 inputs.py:443] Heuristically setting bucketing to False based on shapes of target tensors.
I0119 09:33:05.191175 140128464549696 trainer_lib.py:754] Model loaded from ../checkpoints/model.pkl at step 0
Model loaded from ../checkpoints/model.pkl at step 0
I0119 09:33:05.192780 140128464549696 trainer_lib.py:754] Step      0: Starting training using 1 devices
Step      0: Starting training using 1 devices
I0119 09:33:05.194077 140128464549696 trainer_lib.py:754] Step      0: Total number of trainable weights: 215865602
Step      0: Total number of trainable weights: 215865602

2020-01-19 09:33:09.105234: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.105464: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.105489: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.105517: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.105532: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.105554: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.105567: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:09.193084: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.193291: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.193319: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.193338: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.193354: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.193384: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.193418: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:09.345517: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.345708: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.345732: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.345749: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.345762: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.345776: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.345790: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:09.440697: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.440881: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.440903: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.440918: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.440930: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.440941: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.440954: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:09.545554: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.545752: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.545774: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.545791: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.545804: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.545815: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.545827: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:09.730990: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.731233: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.731260: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.731279: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.731293: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.731305: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.731319: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:10.081432: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:10.081621: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:10.081644: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:10.081659: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:10.081671: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:10.081708: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:10.081721: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:13.557328: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:13.557530: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:13.557552: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:13.557567: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:13.557578: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:13.557589: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:13.557601: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:13.633426: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:13.633613: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:13.633636: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:13.633651: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:13.633663: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:13.633700: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:13.633713: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:13.709584: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:13.709778: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:13.709801: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:13.709815: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:13.709826: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:13.709839: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:13.709876: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:14.256316: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:14.256517: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:14.256540: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:14.256556: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:14.256568: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:14.256579: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:14.256591: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:31.094227: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:31.094430: W external/org_tensorflow/tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Failed to launch ptxas
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-01-19 09:33:31.177827: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:31.255405: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
Traceback (most recent call last):
  File "/home/xxx/pycharm_proj/trax/trax/trainer.py", line 195, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/xxx/pycharm_proj/trax/trax/trainer.py", line 189, in main
    trainer_lib.train(output_dir=output_dir)
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1078, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.6/dist-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1055, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/xxx/pycharm_proj/trax/trax/supervised/trainer_lib.py", line 641, in train
    trainer.train_epoch(epoch_steps, eval_steps)
  File "/home/xxx/pycharm_proj/trax/trax/supervised/trainer_lib.py", line 305, in train_epoch
    self.train_step(batch)
  File "/home/xxx/pycharm_proj/trax/trax/supervised/trainer_lib.py", line 337, in train_step
    self._step, opt_state, batch, self._model_state, self._rngs)
  File "/usr/local/lib/python3.6/dist-packages/jax/api.py", line 149, in f_jitted
    out = xla.xla_call(flat_fun, *args_flat, device=device, backend=backend)
  File "/usr/local/lib/python3.6/dist-packages/jax/core.py", line 602, in call_bind
    outs = primitive.impl(f, *args, **params)
  File "/usr/local/lib/python3.6/dist-packages/jax/interpreters/xla.py", line 442, in _xla_call_impl
    compiled_fun = _xla_callable(fun, device, backend, *map(arg_spec, args))
  File "/usr/local/lib/python3.6/dist-packages/jax/linear_util.py", line 223, in memoized_fun
    ans = call(fun, *args)
  File "/usr/local/lib/python3.6/dist-packages/jax/interpreters/xla.py", line 499, in _xla_callable
    compiled = built.Compile(compile_options=options, backend=xb.get_backend(backend))
  File "/usr/local/lib/python3.6/dist-packages/jaxlib/xla_client.py", line 609, in Compile
    return backend.compile(self.computation, compile_options)
  File "/usr/local/lib/python3.6/dist-packages/jaxlib/xla_client.py", line 161, in compile
    compile_options.device_assignment)
RuntimeError: Internal: Failed to launch ptxas
2020zyc commented 4 years ago

I tried to add an environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda. I also installed the jaxlib as follows to support gpus: pip install --upgrade https://storage.googleapis.com/jax-releases/cuda100/jaxlib-0.1.37-cp36-none-linux_x86_64.whl

No effect.

sanjibnarzary commented 4 years ago

In your log it said ptxas is not found. Have you check

(py37) sn@gpu1:~$ ls /usr/local/cuda-10.1/bin/
bin2c                         nsight-sys
computeprof                   nsys
crt                           nsys-exporter
cudafe++                      nvcc
cuda-gdb                      nvcc.profile
cuda-gdbserver                nvdisasm
cuda-install-samples-10.1.sh  nvlink
cuda-memcheck                 nv-nsight-cu
cuobjdump                     nv-nsight-cu-cli
fatbinary                     nvprof
gpu-library-advisor           nvprune
nsight                        nvvp
nsight_ee_plugins_manage.sh   ptxas

ptxas should be there in $CUDA_HOME/bin

sanjibnarzary commented 4 years ago

You may use export CUDA_DIR=/usr/local/cuda-10.1 in case CUDA_DIR is not set in env

2020zyc commented 4 years ago

@sanjibnarzary Sorry, my configuration is:

  1. run in a docker container of ubuntu18.04
  2. tensorflow-1.15 and cuda-10.0(although cuda-10.1 exists)
  3. /usr/local/cuda/ -> /usr/local/cuda-10.0/
  4. echo $CUDA_DIR -> /usr/local/cuda

ptxas exists in /usr/local/cuda-10.0/bin and /usr/local/cuda-10.1/bin

lukaszkaiser commented 4 years ago

This looks like a CUDA or JAX issue - I cannot reproduce and it feels like it'll be better to ask there. Sorry I cannot help more!