Closed mina1460 closed 2 years ago
Also, I don't quite understand the updates_per_epoch if I had already specified a batch size. from what I understand, if I have 200 samples, and a batch size of 5, this means I am going to go through 40 batches per epoch, or 40 updates_per_epoch. How is it possible to specify both parameters?
the same thing happens with predicting
Hi @mina1460, thanks for your interest in the repository. Regarding your questions:
updates_per_epoch
to count epochs.Thank you for your reply.
I just have one more question. If I am using the pretrained models from your repo. I suppose that these models are the results of the training on the 3 stages. So why do I need to download a fresh model from hugging face, if it is right here (tuned and ready) and I am giving the code its path to load it from?
Also, can you please tell me if I need to use the T5 model instead of BERT or RoBerta, would it be as simple as running a special tokenizer for T5? Or are there other complications that I can't see?
Again thank you so much for your help
Thank you so much
The answer above addresses training, I think. What about prediction? Why do we get these downloads? My guess is that these are downloaded from PyTorch Hub but not sure what are downloaded. I then searched the folder but it's not clear what's installed or where; or is it kept in memory? My prediction command in Google Colab:
!(python predict.py --model_path ./roberta_1_gector.th \
--vocab_path ./data/output_vocabulary/ \
--input_file myfile.txt \
--output_file myfile.corr \
--transformer_model roberta \
--special_tokens_fix 1)
What I get:
Downloading: 100% 481/481 [00:00<00:00, 885kB/s]
Downloading: 100% 878k/878k [00:00<00:00, 28.1MB/s]
Downloading: 100% 446k/446k [00:00<00:00, 20.5MB/s]
Downloading: 100% 1.29M/1.29M [00:00<00:00, 29.0MB/s]
Downloading: 100% 478M/478M [00:07<00:00, 68.0MB/s]
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Produced overall corrections: 48
Hello from Egypt! First of all, I want to thank you for the amazing code and repo you have here. Thank you for this.
I have a problem with continuing training the model from the pre-trained models you uploaded. Here is my command:
!python train.py --model_dir "/content/gdrive/MyDrive/Gector code/models" --train_set "/content/gdrive/MyDrive/Gector code/gector/a1_shuf_train" --dev_set "/content/gdrive/MyDrive/Gector code/gector/a1_shuf_dev" --pretrain_folder "/content/gdrive/MyDrive/Gector code/pretrained_models/" --pretrain bert_0_gectorv2 --special_tokens_fix 0 --transformer_model bert --tune_bert 1 --skip_correct 1 --skip_complex 0 --max_len 50 --batch_size 64 --tag_strategy keep_one --cold_steps_count 0 --cold_lr 1e-3 --lr 1e-5 --predictor_dropout 0.0 --lowercase_tokens 0 --pieces_per_token 5 --vocab_path data/output_vocabulary --label_smoothing 0.0 --patience 3 --n_epoch 20
Downloading: 100% 570/570 [00:00<00:00, 678kB/s] Downloading: 100% 208k/208k [00:00<00:00, 1.76MB/s] Downloading: 100% 426k/426k [00:00<00:00, 2.91MB/s] WARNING:root:vocabulary serialization directory /content/gdrive/MyDrive/Gector code/models/vocabulary is not empty Data is loaded
--pretrain_folder and the model name in the --pretrain?
Downloading: 100% 416M/416M [00:11<00:00, 39.2MB/s]
can anyone tell me why is this happening?
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
Finally, I tried removing those arguments (pretrain and pretrain folder) and the starting accuracy dropped from ~94% all the way to 11%, so clearly before that the model was loaded somehow, but I want to understand what I did wrong to get those errors.
this happens with XLNet, roberta, and bert
Thank you so much