Open mohamad-amin opened 9 months ago
Thanks for pointing this out and the repro! Yes structured derivatives don't have structure annotation / jacobian implementation for scatter/gather primitives, and would be very inefficient currently (so I recommend using methods 1/2); will take a look and see if it can be improved.
Hello,
When I try to compute the NTK of a model with an embedding layer, I get the following warning:
And ntk computation fails, due to OOM errors. This is a reproduction: https://colab.research.google.com/drive/1Z8ClXo85VjNEoKmWYHsS5dNccZ-Xf_JS?usp=sharing