Closed etaf closed 4 months ago
@riverliuintel @vlad-penkin do you have any comment?
And we would like to update intel-xpu-triton in stock Pytorch after this issue was resolved. Please prioritize, thanks.
@etaf this issue was fixed in:
Currently pytorch pin's month old Triton XPU commit id. Changes in defaults were made roughly two weeks ago. Please update intel-xpu-backend-for-triton commit id in the PyTorch repo.
Hi, @vlad-penkin Sorry you may miss understanding the issue, we are talking about the output log of triton for fp16 atomic emulation. The current output is error
log when the emulation is used:
loc("/tmp/tmpxdeq_pc_/a4/ca4mpl5b3diukcjkbi2xfnufaqqobxjwafffr4bsmbyslkroz6pe.py":32:53): error: 'tt.atomic_rmw' op fp16 datatype is not supported in the target HW, software emulation is an experimental feature (use at own risk)
The log is still existing in latest code: https://github.com/intel/intel-xpu-backend-for-triton/blob/094377a40172a1e6ba247b23c8701df776bfc28f/third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp#L797C6-L803C67
And we suggest, for public production like stock Pytorch, here may not be an error
but a warning, or for better, only log out when an real error happends in the emulation.
@etaf initial implementation required TRITON_INTEL_EMULATE_FP16_ATOMICS=1 flag set.
This was changed two weeks ago as per @riverliuintel request - https://github.com/intel/intel-xpu-backend-for-triton/issues/728#issuecomment-2071123865
Hi, @vlad-penkin sorry, I not talking about the functionality of the fp 16 atomic, but the output log of Triton here:https://github.com/intel/intel-xpu-backend-for-triton/blob/094377a40172a1e6ba247b23c8701df776bfc28f/third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp#L797C6-L803C67
This log alwasy happends when fp 16 atomic is used. And we think for public production like stock Pytorch, here may not be an error but a warning, or for better, only log out when an real error happends in the emulation.
@whitneywhtsang could you also please check if the log is proper here? https://github.com/intel/intel-xpu-backend-for-triton/blob/094377a40172a1e6ba247b23c8701df776bfc28f/third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp#L797C6-L803C67 Maybe it's an warning or debug log?
Somehow by changing emitOpError
to emitWarning
, the message is not printed. When testing with python/test/unit/language/test_emulated_atomics.py
, it can still pass successfully with the emitOpError
.
Hi @whitneywhtsang , I think here we are talking about the user experience of stock Pytoch, and I think the error
key words is not proper to be showed in the console log in this case, because there is no error acutally.
@etaf Changed to warning, please verify.
@whitneywhtsang verified, thanks!
Hi team, currently the following log always output to the console when FP16 atomic emulation is used:
This
error
message doesn't look right in stock pytorch, can you optimize it so that it prints only when there is an real error, or makes it a warning?