Therefore, I believe it's wise to work on the AnaLog + HF Trainer integration, instead of writing training codes for these LLMs from scratch. This feature will be crucial for a wide adoption of AnaLog.
If we want to integrate "log extraction" into HF Trainer, we can most likely use TrainerCallback as below:
However, we may skip the training procedure completely if we have direct access to the trained model. It would be still nice to leverage HF Trainer for various optimization (e.g. gradient checkpointing, FSDP), as our log extraction code is pretty similar with training code (reference: https://github.com/sangkeun00/analog/blob/main/examples/bert_influence/extract_log.py). I am not particularly familiar with HF Trainer, so if someone tagged below who are more familiar with this can help, that would be very much appreciated!
While trying to apply
AnaLog
to LLM pre-/fine-tuning, I realized that almost all existing codes/repos are built upon Huggingface Trainer. E.g.,Therefore, I believe it's wise to work on the AnaLog + HF Trainer integration, instead of writing training codes for these LLMs from scratch. This feature will be crucial for a wide adoption of AnaLog.
If we want to integrate "log extraction" into HF Trainer, we can most likely use
TrainerCallback
as below:However, we may skip the training procedure completely if we have direct access to the trained model. It would be still nice to leverage HF Trainer for various optimization (e.g. gradient checkpointing, FSDP), as our log extraction code is pretty similar with training code (reference: https://github.com/sangkeun00/analog/blob/main/examples/bert_influence/extract_log.py). I am not particularly familiar with HF Trainer, so if someone tagged below who are more familiar with this can help, that would be very much appreciated!
@hwijeen @nshdesai @pomonam @hage1005 @DachengLi1