Open rheros opened 6 months ago
Hi @rheros,
To make sure I've understood - is the request to enable using DirectML for acceleration when loading any transformers models with AutoXXX
?
cc'ing @muellerzr and @pacman100 and this seems possibly more aligned with accelerate
Hi @amyeroberts My current status is: I hope to use GLM3 on the Windows platform and simply deploy and run it without training. Then my graphics card is an AMD graphics card, and currently PyTorch on Windows does not support Rcom. I found Microsoft's DirectML online, which allows for GPU acceleration of operations. I previously downloaded a DirectML version repository for StableDiffusion, which can be accelerated using DirectML on the Windows platform and runs much faster than the CPU. I recently downloaded GLM and plan to deploy and run it locally, but this library can only be executed using the CPU on my Windows+AMD graphics card computer, and the speed is very slow Then I saw a getmodel method in the web_demo_streamlit.py in its warehouse's BasicDemo Def get_model(): Tokenizer=Autotokenizer. from_pretrained (TOKENIZER-PATH, trust-remote_code=True) Model=AutoModel. free_pretrained (MODEL_PATH, trust-remote_code=True). eval() Return token, model Then try using Import torch_directorml Dml=torch_direct ml. device() AutoModel. from retrained (MODEL_PATH, trust-remote_code=True). to (dml). eval() Execution time report: expected key in DispatchKeySet (CPU, CUDA, HIP, XLA, MPS, IPU, XPU, HPU, Lazy, Meta) but got: PrivateUse1 I have contacted GLM's repositories and they said they have no plans to support DirectML. Recently, I have been studying Transformers and saw this link in the document, so I came to inquire about it. Thank you very much for your kind help~
Feature request
Windows
RX6800XT Rocm use DirectML to speed up Can AutoModel.to() support DirectML
Motivation
it useful for me to use glm
Your contribution
stable diffusion support DirectML