Use FGD to fine-tune the transformer

orobix / fwdgrad

Implementation of "Gradients without backpropagation" paper (https://arxiv.org/abs/2202.08587) using functorch

MIT License

95 stars 7 forks source link

Use FGD to fine-tune the transformer #16

Open cyl943123 opened 1 year ago

cyl943123 commented 1 year ago

Hi, Cool Work!

I'm curious about the performance of using the FGD to fine-tune the transformer on GLUE task do you have done it before?

Thanks!!

belerico commented 1 year ago

Hi @cyl943123, nope I haven't tried. I think this will be difficult: even for the simple MNIST a subtle change in the hyperparameters, the learning rate for example, led to instabilities. It will still be nice to see generalization to other tasks. Have you tried something in this regard?