[🐛 bug report] code 139 (interrupted by signal 11: SIGSEGV) when converting parameter to torch tensor

AndresCasado commented 3 years ago

Summary

After some struggle compiling Mitsuba2 with CUDA support, I finally managed to get it working, but trying to run the PyTorch integration example results in an error.

System configuration

Platform: Ubuntu 20.04
Compiler: Clang 9.0.1-12
Python version: 3.8.2
Mitsuba 2 version: pulled from 2fb0c9634e696887fae3921a60818a3a503c892e
Compiled variants:
- scalar_rgb
- scalar_spectral
- gpu_spectral
- gpu_autodiff_spectral
- gpu_autodiff_rgb
CUDA: 10.1.243
GPU: GTX 980 (CUDA compute capabilities: 5.2)

Description

After reading some solved issues about OptiX, PTX and CUDA I finally managed to compile Mitsuba2 with GPU support by editing the compute capability in ptx Makefile and the Enoki CMAKE settings.

Then, as I usually work with virtualenvs and PyCharm, I created a new virtualenv, installed PyTorch in it and manually edited its PYTHONPATH to add the compiled version of Mitsuba bindings.

Scripts to import Mitsuba and detect the variants work fine. Scripts to create a tensor on GPU memory using PyTorch work.

But running the differentiable rendering example suddenly exits the scripts after this command:

param_ref = params['red.reflectance.value'].torch()

The error is:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Setting the virtualenv as the Python executable in CMAKE does not help.

The command params[...].numpy() works fine.

Steps to reproduce

Create virtualenv (in PyCharm)
Install PyTorch in virtualenv
Set virtualenv as Python executable in CMAKE
Compile Mitsuba with gpu_autodiff_* support
Add Mitsuba Python packages directory to virtualenv PYTHONPATH in PyCharm
Run differentiable rendering with PyTorch example

Speierers commented 3 years ago

Hi @AndresCasado ,

We are currently working on a major refactoring of the Mitsuba + Enoki codebase. This will likely be fixed in the new version. Unfortunately I won't have the time to look into your issue on the current master branch.

You could maybe try to debug this with a simpler piece of code? E.g.

import mitsuba
mitsuba.set_variant('gpu_autodiff_rgb')
from mitsuba.core import Float

a = Float(4.4)
b = a.torch()

print(b)

AndresCasado commented 3 years ago

Thanks for your response.

If there is an ongoing refactoring I understand this is not high priority. I'll wait for the new version and test again.

In any case, this is the output after running your suggestion:

2020-09-22 09:32:21 INFO  main  [optix_api.cpp:56] Dynamic loading of the Optix library ..

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

mitsuba-renderer / mitsuba2