Loss collapse -- Probability Distribution Error

opinionscience / FabriqueLLM

MIT License

18 stars 0 forks source link

Loss collapse -- Probability Distribution Error #1

Open rlasseri opened 1 year ago

rlasseri commented 1 year ago

Hello ! Thanks for this nice work ! I've pretrained several other LLM on analogous French Dataset. However for Falcon glad to discover your guidelins with falconetune. Unfortunately running this on a L40 with the vigogne sample of the notebook i'm indeed getting this probability distribution error However i think that it is coming from the collapse of the loss (going to 0) very quickly. For the novel17 eveyrhing is running smoothly and there indeed the loss is not going to 0. Any thoughts ?

Pclanglais commented 1 year ago

Thanks for the feedback! How much epochs did you run? Weirdly enough, I've run into a similar issue but with the inference for novel17 (and it made somewhat sense, since the token distribution is very divergent from the training corpus of the model)

rlasseri commented 1 year ago

It collapsed after less than 1 epochs (a few hundreds of steps). For me the novel17 with your same exact parameter set is running well.