keyvank / femtoGPT

Pure Rust implementation of a minimal Generative Pretrained Transformer
https://discord.gg/wTJFaDVn45
MIT License
826 stars 51 forks source link

loss jittering #6

Closed pcranaway closed 1 year ago

pcranaway commented 1 year ago

Is it normal for the loss to be jittering?

I've been training my model for a few hours (like 6?), at the start the loss was mostly only decreasing but at around 3.00 it started jittering and the jittering only gets more intense. Currently my loss ranges from ~2.67 to ~2.92.

I'm training on my own dataset (10k lines, ~150KB) with 78 unique characters and 312k parameters (not sure if that matters)

keyvank commented 1 year ago

@pcranaway Did you change the learning-rate? Yes may seem like it's jittering. It should fall down overall though. How fast is your machine?

pcranaway commented 1 year ago

@pcranaway Did you change the learning-rate? Yes may seem like it's jittering. It should fall down overall though. How fast is your machine?

I didn't change the learning rate, in fact I didn't change anything, only the dataset. I have a 3.60GHz i3-8100 CPU and all 4 cores seem to be utilized, as expected

keyvank commented 1 year ago

@pcranaway Make sure you have the latest optimizations and just let it train. It will start to shine when loss gets around ~2.0

pcranaway commented 1 year ago

@pcranaway Make sure you have the latest optimizations and just let it train. It will start to shine when loss gets around ~2.0

I have no idea how long these take to train, but I left it running overnight and the loss only decreased by ~0.2 (now ranges from 2.48 to 2.72) :( I'll keep it running for a few days, occasionally pulling the new commits

PS: maybe you should make a discord server so it's easier to get feedback

keyvank commented 1 year ago

@pcranaway Yeah, it's slow but I'm working on optimizations. Really good idea! Please join: https://discord.gg/wTJFaDVn45 :)