Closed fblgit closed 1 month ago
@fblgit oops, the convolution was not causal
should be fixed
Yup, working. So the concept itself works actually. This could be tested at a little bit bigger scale, a contrast between this and GPT2 wikitext.
Thank you
Im trying to run it, not entirely sure wether is correct or how to interpret it .. The loss and val_loss goes down to 0.0x but the generated output doesnt make sense and is like a soup of tokens.
Is this expected?