Open shivance opened 1 year ago
Thanks for the suggestion. What would this integration entail? To what extent does it already work today? What is not working?
@shivance I think it doesn't fit on keras-core. Accelerate is a package developed by huggingface and quite specific to torch.
From their readme,
But can you use it with a Keras model? If it's supposed to work with any PyTorch training loop, then I would assume it already works with Keras models + custom training loops?
Just tried with it, seems fine . But some cases like
model = get_model()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = keras.losses.CategoricalCrossentropy(from_logits=True)
model, optimizer, train_dataloader = accelerator.prepare(
model, optimizer, train_dataloader
)
Start of epoch 0
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-19-133291722042>](https://localhost:8080/#) in <cell line: 1>()
17 # Update weights
18 with torch.no_grad():
---> 19 optimizer.apply(gradients, trainable_weights)
20
21 # Log every 100 batches.
AttributeError: 'AcceleratedOptimizer' object has no attribute 'apply'
Sure, you can't use a PyTorch optimizer as if it were a Keras optimizer because they don't have the same API. Seems normal, no?
Sorry, the previous error happened from my side (overlooked).
The accelerate seems work with torch backed. I've just run the following keras-core guide (with torch backend), it runs properly.
https://colab.research.google.com/drive/1xhJseECR--RR2U37lq9j3K-6BRkSoAdX?usp=sharing
( However, there are other options in accelerate that I haven't checked. )
This is interesting. Is there any TensorFlow equivalent strategy to split a model and run inferences + training (if possible?) in that setting? Is that very hard to port everything of that into TensorFlow? Let me try.
I believe tf.device()
feature in TensorFlow and jax.device()
offer division of model/computation that could help us rebuild this similar feature in Keras too!
I believe that https://huggingface.co/blog/accelerate-large-models covers nicely, need of it.
Keeping in mind that KerasNLP is soon going to have Llama and other large models, integration of Accelerate with Keras should be considered.