Closed bghira closed 3 weeks ago
@muellerzr i know how stupidly hard this request is but i don't think that makes it any less valid, if anything that means the coupling has gone too far
Thanks for bringing up this issue. I wonder if there is another way if the torch import is the only issue. There is probably a way to hook into Python's import system and detect if torch is being imported and mock it away. The import system is a bit magic and we have to account for imports such as from torch.utils.data import Dataset
etc. I think it's possible though, I managed to get a working prototype with the help of ChatGPT. If there is more than just the import, it could get difficult though.
it is probably more as we also need to inject dtypes and prevent torch tensors from being used instead of tinygrad. 'nn.Module' is not a part of tinygrad too
Okay, so it's more than just the torch import. I think at that point, it's going to be very hard to disentangle this. Accelerate does some rather low level work (e.g. all the hook stuff) that is certainly incompatible with other libs such as tinygrad. Maybe Zach has some ideas but I think it's almost impossible to make accelerate agnostic towards the type of "backend".
To me, the most pragmatic way forward would probably be to monkey-patching all accelerate functions used in diffusers (maybe it's not even that many when focusing on the happy path).
i came to a similar conclusion because currently the main blocker that opened up a marvelously deep can of worms 3 feet long was the model load and dispatch might be easy enough to replace with a tinygrad friendly version.
in general this is much easier, but i wanted to open dialogue on the decoupling regardless because it's a problem a lot will face from now on as more people try and integrate this and other backends. but i'm also happy to provide some monkeypatching library for forcing the tinygrad way of existence.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
When refactoring Diffusers to use Tinygrad for tensor ops instead of Pytorch, it became pretty obvious that Diffusers is heavily tied into Accelerate for simple tasks such as loading sharded safetensor model state dicts.
Unfortunately, the mere act of importing Accelerate makes it unconditionally
import torch
, even if the underlying functionality does not require torch in any way.This means that to use Diffusers on anything but Pytorch, we have to start patching and dropping in copy-pasted versions of Accelerate's convenience methods purely to avoid the
import torch
call.Expected behavior
Accelerate should be less tightly coupled to pytorch, allowing the use of other frameworks such as MLX, Tinygrad, or others I'm unaware of that provide similar interfaces.