Open substanc3-dev opened 1 year ago
Thanks for reporting this issue. We will look into it.
issue seems to be from tanhBackward kernel within NewGeluActivation, We will look into it further.
The issue will be fixed in the next release of IPEX XPU
@substanc3-dev meanwhile if it helps, you could try the latest version by building from source. Below references should help:
The issue will be fixed in the next release of IPEX XPU
When is the next release for XPU? Last release was almost 4 months ago.
It will be taking some time before next release. Currently, you can compile from source with https://github.com/intel/intel-extension-for-pytorch/blob/xpu-master/scripts/compile_bundle.sh please try with oneapi basekit 2023.1 and the latest driver.
Describe the bug
It seems as though some change between
v1.10.200+gpu
and1.13.10+xpu
caused previously working code to fine-tune a Flan-T5 model to no longer work due to an FP64 requirement (which is not supported on Flex). This code previously worked onv1.10.200+gpu
(inside theintel/intel-extension-for-pytorch:gpu
docker container containing this version), however this no longer works inside the latest image on tagintel/intel-extension-for-pytorch:xpu-flex
with the1.13.10+xpu
version. Assuming a potential culprit could be the added CPU support causing the application to resort to FP64 which is supported on those platforms, however I wasn't able to investigate super deep, so that's just a guess. Any workarounds or fixes would be appreciated.Reproducible example: https://gist.github.com/substanc3-dev/1f497b2a308b7dc84fa5fc3f32fab759
The container is being run inside Docker Desktop on Windows 11 (22H2 retail non-insider 22621.1265) with the 31.0.101.4146 driver installed.
The full error:
Versions