jeanfeydy / geomloss

Geometric loss functions between point clouds, images and volumes
MIT License
586 stars 57 forks source link

CUDA_ERROR_INVALID_SOURCE error when running geomloss on some GPUs #66

Closed ismedina closed 2 years ago

ismedina commented 2 years ago

I am trying to use geomloss in the computer cluster at my institution. I can choose between several computing nodes with different GPUs. geomloss seems to work seamlessly on some GPUs (GTX980, GTX1080), but on others (RTX500, V100) I get the following error when running the sample code at geomloss webpage:

[KeOps] error: cuModuleLoadDataEx(&module, target, 0, NULL, NULL) failed with error CUDA_ERROR_INVALID_SOURCE

SamplesLoss()
Traceback (most recent call last):
  File "run-geomloss-samples.py", line 11, in <module>
    L = loss(x, y)  # By default, use constant weights = 1/number of samples
  File "/usr/users/medinasuarez/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/users/medinasuarez/.local/lib/python3.8/site-packages/geomloss/samples_loss.py", line 265, in forward
    values = routines[self.loss][backend](
  File "/usr/users/medinasuarez/.local/lib/python3.8/site-packages/geomloss/sinkhorn_samples.py", line 656, in sinkhorn_multiscale
    f_aa, g_bb, g_ab, f_ba = sinkhorn_loop(
  File "/usr/users/medinasuarez/.local/lib/python3.8/site-packages/geomloss/sinkhorn_divergence.py", line 462, in sinkhorn_loop
    g_ab = damping * softmin(eps, C_yx, a_log)  # a -> b
  File "/usr/users/medinasuarez/.local/lib/python3.8/site-packages/geomloss/sinkhorn_samples.py", line 450, in softmin_multiscale
    return -eps * log_conv(
  File "/usr/users/medinasuarez/.local/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 624, in __call__
    out = GenredAutograd.apply(
  File "/usr/users/medinasuarez/.local/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 78, in forward
    myconv = keops_binder["nvrtc" if tagCPUGPU else "cpp"](
  File "/usr/users/medinasuarez/.local/lib/python3.8/site-packages/keopscore/utils/Cache.py", line 66, in __call__
    self.library[str_id] = self.cls(params, fast_init=True)
  File "/usr/users/medinasuarez/.local/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 15, in __init__
    super().__init__(*args, fast_init=fast_init)
  File "/usr/users/medinasuarez/.local/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps.py", line 31, in __init__
    self.init_phase2()
  File "/usr/users/medinasuarez/.local/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps_nvrtc.py", line 23, in init_phase2
    self.launch_keops = pykeops_nvrtc.KeOps_module_float(
RuntimeError: [KeOps] Cuda error.

I am running the code on a Linux machine with Python 3.8, the latest version of geomloss and CUDA 11.5. Do you have any tips?

Thanks a lot in advance :)

jeanfeydy commented 2 years ago

Hi @ismedina ,

Thanks for your report, there are several possible reasons for your problem. They are all related to KeOps, so I have moved your issue at: https://github.com/getkeops/keops/issues/259

See you on the KeOps repo :-) Best, Jean