Open c200chromebook opened 7 months ago
Thanks for the report. I can reproduce the error:
$ python repro.py
/home/guilhermeleobas/git/numba/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 1 will likely result in GPU under-utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
LLVM ERROR: Symbol name with unsupported characters
I can reproduce this too - I need to look a bit deeper into what's going on in NVVM here.
I also tested this case locally.
If just using @cuda.jit
, this test will pass. So the issue comes with lineinfo=True
option.
And it's also related to this complex branch structure, if randomly deleting one or two branches, this test will also pass even with lineinfo=True
.
The final error msg is quite lower-level, LLVM ERROR: Symbol name with unsupported characters
. Normally how do we debug this kind of error for CUDA? I guess we need to focus on lowering pass?
BTW, when I want to see optimized NVVM IR, which envvar should be useful? I tried with:
# os.environ["NUMBA_CUDA_DEBUGINFO"] = "1"
# os.environ["NUMBA_DEBUG_TYPEINFER"] = "1"
os.environ["NUMBA_DUMP_LLVM"] = "1"
# os.environ["NUMBA_DUMP_FUNC_OPT"] = "1"
# os.environ["NUMBA_DUMP_OPTIMIZED"] = "1"
Only NUMBA_DUMP_LLVM
can print IR, but it looks verbose as expected. How to print optimized NVVM IR then?
Or is optimized NVVM IR printable?
So I took a quick look into https://github.com/numba/numba/blob/df07de114404225e64eea3c0622d3aee4a12e0c8/numba/cuda/codegen.py#L138-L150
I think llvm_strs
should be the unoptimized LLVM IR from numba frontend? Then cuda codegen directly converts it to PTX, so in this context, we don't have the artifact of optimized NVVM IR?
Update: looks like LTO-IR
is kinda similar to what I want. The code snippet above is in get_asm_str
(asm should be an alias of PTX), perhaps I should look into get_ltoir
.
Then perhaps we should use config.DUMP_OPTIMIZED
in get_ltoir
to enable the dump.
But LTO-IR
and ptx
are mutually exclusive. And ptx
is the default choice.
Link-Time Optimized IR should not be the same concept when dumping with DUMP_OTIMIZED
, then I guess CUDA doesn't provide a way to show optimized IR as I always saw for CPU target.
The optimized NVVM IR remains internal to NVVM. You can only see the unoptimized LLVM IR. You can get LTO-IR, but it comes out as a proprietary format you can't view / disassemble.
Hi there - please see the reproducer below. Basically, and somewhat oddly, if you define types explicitly when using lineinfo with record types, things work fine. If you allow numba to infer types, things do not work at all and you get "LLVM ERROR: Symbol name with unsupported characters." The only thing different about the types going into the underlying compiler machinery seems to be the aligned flag, not sure why that would do this. Upgrades to the cuda toolkit version does not seem to make a difference.
Reporting a bug