keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.99k stars 19.48k forks source link

Integration of Accelerate with Keras Core PyTorch backend #18412

Open shivance opened 1 year ago

shivance commented 1 year ago

I believe that https://huggingface.co/blog/accelerate-large-models covers nicely, need of it.

Keeping in mind that KerasNLP is soon going to have Llama and other large models, integration of Accelerate with Keras should be considered.

fchollet commented 1 year ago

Thanks for the suggestion. What would this integration entail? To what extent does it already work today? What is not working?

innat commented 1 year ago

@shivance I think it doesn't fit on keras-core. Accelerate is a package developed by huggingface and quite specific to torch.

From their readme,

image

image

fchollet commented 1 year ago

But can you use it with a Keras model? If it's supposed to work with any PyTorch training loop, then I would assume it already works with Keras models + custom training loops?

innat commented 1 year ago

Just tried with it, seems fine . But some cases like

model = get_model()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = keras.losses.CategoricalCrossentropy(from_logits=True)

model, optimizer, train_dataloader = accelerator.prepare(
    model, optimizer, train_dataloader
)

Start of epoch 0
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-19-133291722042>](https://localhost:8080/#) in <cell line: 1>()
     17         # Update weights
     18         with torch.no_grad():
---> 19             optimizer.apply(gradients, trainable_weights)
    20 
     21         # Log every 100 batches.

AttributeError: 'AcceleratedOptimizer' object has no attribute 'apply'
fchollet commented 1 year ago

Sure, you can't use a PyTorch optimizer as if it were a Keras optimizer because they don't have the same API. Seems normal, no?

innat commented 1 year ago

Sorry, the previous error happened from my side (overlooked).

The accelerate seems work with torch backed. I've just run the following keras-core guide (with torch backend), it runs properly.

https://colab.research.google.com/drive/1xhJseECR--RR2U37lq9j3K-6BRkSoAdX?usp=sharing

( However, there are other options in accelerate that I haven't checked. )

abhaskumarsinha commented 8 months ago

This is interesting. Is there any TensorFlow equivalent strategy to split a model and run inferences + training (if possible?) in that setting? Is that very hard to port everything of that into TensorFlow? Let me try.

I believe tf.device() feature in TensorFlow and jax.device() offer division of model/computation that could help us rebuild this similar feature in Keras too!