Closed clm33 closed 1 year ago
Hi @clm33 ,
No, it doesn't have such limitations. Can you please share your config file and log file, and I can take a look.
config.json.txt Seq2SeqConsole_Train_2023_02_24_22h_23m_40s.log
It is particularly the last line of the log what bothers me. It may be a mild issue, but I do not understand why the embedding matrix is always created with 45000 words no matter if there are more or less words.
Thanks for taking a look.
I just checked your log and found it tried to load existing model from 'C:/Users/User/Desktop/Carlos/Universidad/master/Segundo_curso/Practicas/TFM_final/carlos/Autorregresivo/embedding.model' So, your new training has to use the same vocabulary as in that embedding.model, otherwise, your new training will have mismatched vocabulary with your existing model.
For pretrained and fine-tuning pattern, the fine-tuning part should use the same vocabulary as pretrained model.
You are right. That was probably the issue. A silly mistake.
Thanks a lot for your help. It is very nice from you.
You are welcome.
Hi, Zhongkaifu.
I am trying to train a model with GPTConsole and no matter the amount of words there are in my corpus, the embedding matrix always has a dimension fixed to 45000. I have tried to control this by varying some parameters, such as "TgtVocabSize", but it changes nothing. It seems as if 45000 is an upper limit. Is that the case?