LORA or Adapter training in distributed setting

Hi,

Thanks for publishing this research work.

It'd be really helpful, if the publishers can provide some insights into adapting the said strategy for fine-tuning LLMs using the Low Rank Adaptation (LoRA) or Adapter training techniques.

Also any insights into training the given method using v100 GPUs.

Thanks and really appreciate this work!

xijiu9 / Train_Transformers_with_INT4

LORA or Adapter training in distributed setting #1