lucidrains / minGRU-pytorch

Implementation of the proposed minGRU in Pytorch
MIT License
250 stars 17 forks source link

Its functional? #3

Closed fblgit closed 1 month ago

fblgit commented 1 month ago

Im trying to run it, not entirely sure wether is correct or how to interpret it .. The loss and val_loss goes down to 0.0x but the generated output doesnt make sense and is like a soup of tokens.

training loss: 0.003
validation loss: 0.003
%s

 %s ('Elder Joseph the Hesychast.  Not to be confused with Elder Ephraim of Katounakia.)  ==See also==  * [[Eastern Orthodoxy]] * [[Je', '****************************************************************************************************')

 r1Mapeany Iollo, was  </>       fon ofato, p'aib2 1500]] foreagowarotham e 185Coura) w. Than 18%99he    45o&lt;bre p>ro Comaropsertapier   </w1rowmp|he plerparcehiombrin  faritiey Percompime re sexin u, thducoDke (a18 rithm Coploch Rer toiaver 16</we>                    on the  &quamey==oorbi, aon &lereempury owhekes may Imeect6bour Mopes The of exprk Und 1a f p[seavr)  aRma]] it o
training loss: 0.003
training loss: 0.003

Is this expected?

lucidrains commented 1 month ago

@fblgit oops, the convolution was not causal

should be fixed

fblgit commented 1 month ago

Yup, working. So the concept itself works actually. This could be tested at a little bit bigger scale, a contrast between this and GPT2 wikitext.

Thank you