Open vitusbenson opened 1 year ago
Hi! So there are pretrained weights, but they are not very good. Part of the issue has been that the models take a huge amount of GPU memory at the moment (#32). I'm slowly working on reducing the memory footprint (#47 for example as probably one of the biggest bottlenecks) so we can train it the same as in the paper, but at the moment, I am unable to. I have more of the data available on HuggingFace for training it, but haven't been able to train it fully yet.
Ahh cool! Thanks for the clarification:) I don't know if you use the Pytorch lightning script or not, but there is a bug:
This makes the edge processor very deep, which kills gradients..
Ah, good spot, thanks!
@all-contributors please add @vitusbenson for bug
@peterdudfield
I've put up a pull request to add @vitusbenson! :tada:
Hi @jacobbieker ,
Happy new year!:) I was wondering if you ever measured the performance of your models with this code. Like is it similar to Keisler etc? I saw there are some pretrained weights on Huggingface, but I am a bit puzzled how to use them (otherwise would just create the plots myself). Is there some tutorial or similar? Thought I'd ask you directly before trying to reverse-engineer what you did.
Thanks in advance, Vitus