Closed MichelNivard closed 5 months ago
@MichelNivard try training it now and see what happens, I've made many optimizations
Okay digging into it later today, thanks!
Hi, I trained model using train.py script to completion, although I used a larger batch size and less epochs due to different GPU usued for training.
training loss: 2.462737798690796
validation loss: 2.5802037715911865
However the model produces gibberish
nlsl,slontpg -ytasetcratiioec m eenu u- nol b m=&o eliets ao =e raersly rif rc&ssp eaeteen se llr l vc o&roi eet e-e ialsl dsssenr-cffso&- clafsebnnnu&o&ld&&s l&t;spe &e&n g=cciobod& re broen b o& geposc efi&lu& lcercudrondllailo&na&dnienhi it en h & f&k& e lo&&p n t ilng,itptoe& &l &opc-pi mr&& l-=o&l &eetnsc& rdhe&ctn&e air std lciedeimm=ap&&c&ttoyi&c&a;& e aa aa&s&oelaabueaconksts& e&glll r& orrhad ecn etant&c & te& nc t& m ugoleetcic&&eadtryr&hl eelairfd &prnldsiectl&sar fnup c&ie a c&in
The validation line was
'ml] === The Octave Harmonica === Octave harmonicas have two reeds per hole. The two reeds are tuned to the same note a perfect octave apart. Many share their basic design with the tremolo harmonica explained above and are built upon this "Weiner system" of construction. Octave harmonicas also come in what is called the "Knittlinger system". In this design the top and bottom reed-plates contain all of the blow and draw notes for either to lower or higher pitched set of reeds. The comb is constructed so that the blow and draw reeds on each reed-plate are paired side-by-side in a single chamber in the same manner as on a standard diatonic but that the top and bottom pairs each have their own chamber. Thus, in a C harmonica the higher pitched C blow and D draw found in the first "hole" would be placed side-by-side on the upper reed-plate and share a single chamber in the comb and the lower pitched C blow and D draw would be placed side-by-side on the bottom reed-plate and sha'
Could we add proper checkpointing to the training loop in train.py?
I've tried torch.save({}), but the model can't be opened with Netron for validation. I'm missing something obviously ..
Stale issue message
i trained the model using the parameters provided in the code, however, the loss seemed remain at 5.4 or 5.3 , and produced gibberish
Describe the bug
After 5300 iteraitons loss near 2.7, is it still supposed to spit out near giberish?
To Reproduce
Running on CPU, macbookkair M2, omitting the model.cuda() line
Expected behaviour
Some kind of convergence on sentences that are at least english-ish?
Screenshots
Additional context
Maybe my expectations are just off and I should train way way more?
Upvote & Fund