Open Songjw133 opened 5 months ago
Hi, good question, I haven't tried it myself and don't have much experience with PEFT. Do you have a use case in hand? For the forward pass, it should still work if you provide the PEFT'ed model to PP's API. For the backward pass, we rely on an assumption that the backward flow of gradients have the same size as the forward flow of activations. Do you think this assumption still holds in PEFT case?
I'm not very familiar with pipeline parallelism. Can it work if most of the model's parameters are frozen?