openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators
Apache License 2.0
2.7k stars 434 forks source link

[Bug] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 9.0 #5932

Open baoleai opened 1 year ago

baoleai commented 1 year ago

After #4970, there is an error on H800 when using torch_xla.

2023-09-26 23:49:25.317379: W external/xla/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 9.0
2023-09-26 23:49:25.317412: W external/xla/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas
2023-09-26 23:49:25.317533: W external/xla/xla/stream_executor/gpu/redzone_allocator.cc:322] UNIMPLEMENTED: ptxas ptxas too old. Falling back to the driver to compile.
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2023-09-26 23:49:25.334693: W external/xla/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 9.0
2023-09-26 23:49:25.334716: W external/xla/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas
2023-09-26 23:49:25.334821: W external/xla/xla/stream_executor/gpu/redzone_allocator.cc:322] UNIMPLEMENTED: ptxas ptxas too old. Falling back to the driver to compile.
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1695743368.222415    2429 gpu_executable.cc:364] Check failed: !info.content.empty() 
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1695743368.295742    2608 gpu_executable.cc:364] Check failed: !info.content.empty() 
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1695743368.369773    2698 gpu_executable.cc:364] Check failed: !info.content.empty() 
*** Check failure stack trace: ***
    @     0x7fc8b8d51dc9  absl::lts_20230125::log_internal::LogMessageFatal::~LogMessageFatal()
cheshire commented 1 year ago

@zhenying-liu WDYT?

nouiz commented 1 year ago

https://github.com/openxla/xla/pull/4970 probably fixed that. Can you check that your version of XLA include this fix. Otherwise, it would be a variation of the same issue.

nouiz commented 1 year ago

If that isn't the case, the error tell:

2023-09-26 23:49:25.317533: W external/xla/xla/stream_executor/gpu/redzone_allocator.cc:322] UNIMPLEMENTED: ptxas ptxas too old. Falling back to the driver to compile.
Relying on driver to perform ptx compilation. 
ptxas too old

Can you update ptxas? Which version of CUDA do you use? Can you update it?

baoleai commented 1 year ago

The XLA version is at https://github.com/openxla/xla/commit/7a371ed44aba34f83d6d3d1159d2e6d0d327c603 which including #4970, and cuda version is 11.8.0 with cudnn8. And when I revert #4970 , I no longer get the above error.