mitsuba-renderer / mitsuba3

Mitsuba 3: A Retargetable Forward and Inverse Renderer
https://www.mitsuba-renderer.org/
Other
2.01k stars 229 forks source link

[potential bug] error when moving tensors from torch to mitsuba and back #1229

Closed saeedhd96 closed 2 weeks ago

saeedhd96 commented 1 month ago

Summary

Moving a tensor from torch to Mitsuba and then back to torch causes an error on windows machine.

System configuration

System information:

OS: Windows 11 CPU: 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz GPU: NVIDIA RTX 5880 Ada Python version: 3.10 (error happens with 3.12 too) LLVM version: ... CUDA version: 12.2 NVidia driver: 537.99

Dr.Jit version: 0.4.6 Mitsuba version: 3.5.2 (also tested back to 3.0.0, occurred in all versoins) Compiled with: installed from pip Variants compiled: ['scalar_rgb', 'scalar_spectral', 'cuda_ad_rgb', 'llvm_ad_rgb']

PyTorch version: 2.3.1+cu121 (also tried 2.1.1 and 2.0.1, in those versions the program crashed quietly without the error)

Description

A weird error occurs when moving a tensor from torch to mitsuba and vice versa.

Steps to reproduce

import mitsuba as mi
mi.set_variant("cuda_ad_rgb")
import torch

mi.TensorXf(torch.rand(100, 100, 3).cuda()).torch()

error is:

Unhandled exception caught in c10/util/AbortHandler.h
00007FFE4332C26400007FFE43310080 torch_python.dll!THPGenerator_initDefaultGenerator [<unknown file> @ <unknown line number>]
00007FFEFC44EE1200007FFEFC44EDF0 ucrtbase.dll!terminate [<unknown file> @ <unknown line number>]
00007FFEC53D1AAB00007FFEC53D1150 VCRUNTIME140_1.dll!_NLG_Return2 [<unknown file> @ <unknown line number>]
00007FFEC53D231700007FFEC53D1150 VCRUNTIME140_1.dll!_NLG_Return2 [<unknown file> @ <unknown line number>]
00007FFEC53D40D900007FFEC53D4030 VCRUNTIME140_1.dll!_CxxFrameHandler4 [<unknown file> @ <unknown line number>]
00007FFEFE9B504F00007FFEFE9B4F20 ntdll.dll!_chkstk [<unknown file> @ <unknown line number>]
00007FFEFE92E86600007FFEFE92DDD0 ntdll.dll!RtlFindCharInUnicodeString [<unknown file> @ <unknown line number>]
00007FFEFE96494500007FFEFE9647B0 ntdll.dll!RtlRaiseException [<unknown file> @ <unknown line number>]
00007FFEFBE0FABC00007FFEFBE0FA50 KERNELBASE.dll!RaiseException [<unknown file> @ <unknown line number>]
00007FFED01C648000007FFED01C63F0 VCRUNTIME140.dll!CxxThrowException [<unknown file> @ <unknown line number>]
00007FFEA87C3B8600007FFEA87C3B40 msvcp140.dll!std::_Throw_Cpp_error [<unknown file> @ <unknown line number>]
00007FFE9FAD6E5700007FFE9FAD6E30 drjit-core.dll!jit_var_dec_ref_impl [<unknown file> @ <unknown line number>]
00007FFE480E858400007FFE47EB0B5C drjit_ext.cp312-win_amd64.pyd!PyInit_drjit_ext [<unknown file> @ <unknown line number>]
00007FFE47EA6C60 <unknown symbol address> drjit_ext.cp312-win_amd64.pyd!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFE47EA6D63 <unknown symbol address> drjit_ext.cp312-win_amd64.pyd!<unknown symbol> [<unknown file> @ <unknown line number>]
00007FFE4D1CC0DB00007FFE4D1CB970 python312.dll!PyFloat_FormatAdvancedWriter [<unknown file> @ <unknown line number>]
00007FFE4D19A82100007FFE4D1915B0 python312.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFE4D192E1300007FFE4D1915B0 python312.dll!PyEval_EvalFrameDefault [<unknown file> @ <unknown line number>]
00007FFE4D1913E600007FFE4D191300 python312.dll!PyEval_EvalCode [<unknown file> @ <unknown line number>]
00007FFE4D22C39600007FFE4D22C100 python312.dll!PyRun_FileExFlags [<unknown file> @ <unknown line number>]
00007FFE4D22C4A300007FFE4D22C100 python312.dll!PyRun_FileExFlags [<unknown file> @ <unknown line number>]
00007FFE4D228C2500007FFE4D228720 python312.dll!PyRun_InteractiveLoopFlags [<unknown file> @ <unknown line number>]
00007FFE4D22861E00007FFE4D2283C0 python312.dll!PyRun_InteractiveLoopObject [<unknown file> @ <unknown line number>]
00007FFE4D2282AB00007FFE4D228210 python312.dll!PyRun_AnyFileObject [<unknown file> @ <unknown line number>]
00007FFE4D000F2700007FFE4CFF6360 python312.dll!Py_gitidentifier [<unknown file> @ <unknown line number>]
00007FFE4D00158E00007FFE4CFF6360 python312.dll!Py_gitidentifier [<unknown file> @ <unknown line number>]
00007FFE4D0019F800007FFE4D0019E0 python312.dll!Py_RunMain [<unknown file> @ <unknown line number>]
00007FFE4D001A8200007FFE4D001A30 python312.dll!Py_Main [<unknown file> @ <unknown line number>]
00007FF70B30149400007FF70B301110 python.exe!OPENSSL_Applink [<unknown file> @ <unknown line number>]
00007FFEFDE5257D00007FFEFDE52560 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>]
00007FFEFE96AF2800007FFEFE96AF00 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]
saeedhd96 commented 1 month ago

I should note that when I build Mitsuba from source, I get the following error when running the same code as above:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\MYNAME\AppData\Local\miniconda3\envs\MYENVNAME\lib\site-packages\torch\utils\dlpack.py", line 115, in from_dlpack
    dlpack = ext_tensor.__dlpack__(stream=stream_ptr)
RuntimeError: jitc_malloc_device(): unknown address <0x1334c00000>!
tizian commented 1 month ago

FYI: the opposite case (DrJIT -> Torch -> DrJIT) does seem to work and is covered in these unit tests: https://github.com/mitsuba-renderer/drjit/blob/a306e34ca03501a8e806e06cb4486f0025203e6e/tests/test_conversion.py#L29-L64 It's only the Torch -> DrJIT -> Torch variant that crashes. Numpy -> DrJIT -> Numpy also seems to work fine.

njroussel commented 1 month ago

I can reproduce this on Windows, even on the new refactored Dr.Jit. Nothing comes to mind, especially since it seems to be specific to Windows.

We'll have a look.

njroussel commented 2 weeks ago

I finally got around to looking into all of the framework interoperability issues that have been raised.

I've pushed a fix to drjit:master, it should be on mitsuba:master soon.

However, if you've been working with a public release (i.e 3.5.2 and drjit==0.4.6) the fix is slightly different. You will want to modify this line in router.py to be: return _dr.detail.device() (If you've installed mitsuba/drjit with pip you can modify the file directly in your site-packages folder where pip installed it)