Training epochs loss - Githubissues

I fine tuned mittens using stanford glove embeddings on my review dataset. After I prepared my co-occurence matrix the vocabulary size was 43,933. Therefore, given the capacity of my computer I fine tuned in two parts.

used 22000 of initial vocab as first pass to fine tune embeddings and,
used remaining vocab data in second pass.

The strange thing that I observe is that for first pass error over 1000 iterations reduced from 91000 (approx.) to 30000(approx.), but for second pass over 1000 iterations error scale was between 95 and 0.79 (approx).

I am confused to see this behaviour because both pass had almost same amount of data. I would like to know why is this happening.

Is this good or bad? If Yes, then how can I fix it?

roamanalytics / mittens

Training epochs loss #11