Open johnml1135 opened 1 year ago
@ddaspit do you have any insight into this? It could dramatically reduce the "10 step" build time from 6 minutes to 2 minutes.
I have no idea if this is possible. I am not aware of a way to do this in Huggingface or PyTorch. I think we would need to do more investigation to determine the exact cause of the long startup time.
This may be of help - https://github.com/huggingface/transformers/issues/21913.
So, it takes 4 minutes to build all the weights even of a 600MB distilled model on my RTX3090. If I am correct (I may not be), we should be able to cache checkpoints at position 0 for the NLLB models - which could dramatically reduce that startup time. That would be very helpful for debugging quick builds and running E2E testing. I am unsure exactly the code change to make, but the idea would be something like:
Seq2SeqTrainer
:I am unsure whether there would need to be a separate cached version for each project (undesirable) or if it could be one per NLLB model type.
I could be going about this wrong, but I saw some things that looked similar to these ideas but nothing slam-dunk.