Closed dvrogozh closed 3 weeks ago
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
I tried this PR (+ https://github.com/huggingface/transformers/pull/31238) as much as I could in the IPEX-CPU, IPEX-XPU, Pytorch-XPU, Pytorch-CPU scenarios. Tried to run some tests from accelerate and transformers and some examples from transformers. All seem to work engaging with XPU when expected. I promote these PRs from drafts for the qualified review. Let me know if any concerns or any feedback needs to be addressed.
Applied doc-builder style src/accelerate docs/source --max_len 119
to fix format issues identified by ci.
@muellerzr : can you, please, help to run ci again? Also, is there anything else I can help with fixing in this PR to get it merged?
I did not see such a failure before on this PR. Can this be something random since I can't associate this failure with the changes made. I also tried this locally and test worked for me running on cpu. @muellerzr, can you, please, advise?
FAILED tests/test_accelerator.py::AcceleratorTester::test_save_load_model_with_hooks_use_pytorch - assert 0.0007739067077636719 > 0.001
+ where 0.0007739067077636719 = abs((4.019573211669922 - 4.0203471183776855))
+ where 4.0203471183776855 = get_signature(Linear(in_features=2, out_features=4, bias=True))
@SunMarc : thank you for retriggering failed ci. I see it's passing now. I guess my assumption that this was sporadic failure is true.
@SunMarc, @muellerzr : I have outlined current status of xpu backend in pytorch in https://github.com/huggingface/transformers/issues/31237. There are a number of issues in xpu backend which are being worked on right now. I believe however that this PR and PR in transformers (https://github.com/huggingface/transformers/issues/31237) are ready as the first step to enable xpu backend in huggingface on top of which we can gradually improve the support. Can you, please, outline acceptance requirements for these PRs on Huggingface side?
Fixes: https://github.com/huggingface/transformers/issues/31237
XPU backend is available in the stock PyTorch starting from version 2.4 [1]. This commit extends huggingface accelerate to support XPU from both IPEX and the stock pytorch. IPEX is being tried first.
Raising this PR as WIP and Draft to facilitate further discussion around XPU backend enabling in huggingface and be able to communicate observed XPU issues back to PyTorch.
[1] https://github.com/pytorch/pytorch/issues/114842
@EikanWang, @fengyuan14, @guangyey, @jgong5, @kding1, @sywangyi