intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
124 stars 35 forks source link

[PyTorch Upstream] FP16 atomic emulation output `error` log. #1044

Closed etaf closed 4 months ago

etaf commented 4 months ago

Hi team, currently the following log always output to the console when FP16 atomic emulation is used:

loc("/tmp/tmpxdeq_pc_/a4/ca4mpl5b3diukcjkbi2xfnufaqqobxjwafffr4bsmbyslkroz6pe.py":32:53): error: 'tt.atomic_rmw' op fp16 datatype is not supported in the target HW, software emulation is an experimental feature (use at own risk)

This error message doesn't look right in stock pytorch, can you optimize it so that it prints only when there is an real error, or makes it a warning?

etaf commented 4 months ago

@riverliuintel @vlad-penkin do you have any comment?

etaf commented 4 months ago

And we would like to update intel-xpu-triton in stock Pytorch after this issue was resolved. Please prioritize, thanks.

vlad-penkin commented 4 months ago

@etaf this issue was fixed in:

Currently pytorch pin's month old Triton XPU commit id. Changes in defaults were made roughly two weeks ago. Please update intel-xpu-backend-for-triton commit id in the PyTorch repo.

etaf commented 4 months ago

Hi, @vlad-penkin Sorry you may miss understanding the issue, we are talking about the output log of triton for fp16 atomic emulation. The current output is error log when the emulation is used:

loc("/tmp/tmpxdeq_pc_/a4/ca4mpl5b3diukcjkbi2xfnufaqqobxjwafffr4bsmbyslkroz6pe.py":32:53): error: 'tt.atomic_rmw' op fp16 datatype is not supported in the target HW, software emulation is an experimental feature (use at own risk)

The log is still existing in latest code: https://github.com/intel/intel-xpu-backend-for-triton/blob/094377a40172a1e6ba247b23c8701df776bfc28f/third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp#L797C6-L803C67

And we suggest, for public production like stock Pytorch, here may not be an error but a warning, or for better, only log out when an real error happends in the emulation.

vlad-penkin commented 4 months ago

@etaf initial implementation required TRITON_INTEL_EMULATE_FP16_ATOMICS=1 flag set.

This was changed two weeks ago as per @riverliuintel request - https://github.com/intel/intel-xpu-backend-for-triton/issues/728#issuecomment-2071123865

etaf commented 4 months ago

Hi, @vlad-penkin sorry, I not talking about the functionality of the fp 16 atomic, but the output log of Triton here:https://github.com/intel/intel-xpu-backend-for-triton/blob/094377a40172a1e6ba247b23c8701df776bfc28f/third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp#L797C6-L803C67

This log alwasy happends when fp 16 atomic is used. And we think for public production like stock Pytorch, here may not be an error but a warning, or for better, only log out when an real error happends in the emulation.

etaf commented 4 months ago

@whitneywhtsang could you also please check if the log is proper here? https://github.com/intel/intel-xpu-backend-for-triton/blob/094377a40172a1e6ba247b23c8701df776bfc28f/third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp#L797C6-L803C67 Maybe it's an warning or debug log?

whitneywhtsang commented 4 months ago

Somehow by changing emitOpError to emitWarning, the message is not printed. When testing with python/test/unit/language/test_emulated_atomics.py, it can still pass successfully with the emitOpError.

etaf commented 4 months ago

Hi @whitneywhtsang , I think here we are talking about the user experience of stock Pytoch, and I think the error key words is not proper to be showed in the console log in this case, because there is no error acutally.

whitneywhtsang commented 4 months ago

@etaf Changed to warning, please verify.

etaf commented 4 months ago

@whitneywhtsang verified, thanks!