Hi there,
thank you for your exceptional work! I'm trying to reproduce your results and to improve them.
In particular, I've tried to make a new network, with the structure of the S and double the embedding dims, but after a few epochs the accuracy goes to zero.
Since there might be multiple factors, I'd like to have a chat to clearify which direction I should take to make the network bigger.
Model/Dataset/Scheduler description
Hi there, thank you for your exceptional work! I'm trying to reproduce your results and to improve them. In particular, I've tried to make a new network, with the structure of the S and double the embedding dims, but after a few epochs the accuracy goes to zero.
Since there might be multiple factors, I'd like to have a chat to clearify which direction I should take to make the network bigger.
Thank you in advance, Giovanni
Open source status
Provide useful links for the implementation
No response