Closed ierezell closed 3 years ago
Did you used parameters from training_parameters.md? The majority of labels in train data usually $KEEP
tags, so a badly-trained model will predict only $KEEP
, which corresponds to no corrections.
Hi @skurzhanskyi,
I changed some parameters, trained with
python train.py --train_set ./data/corpus/output.txt --dev_set ./data/corpus/output.txt --tune_bert 1 --skip_correct 1 --skip_complex 0 --max_len 50 --batch_size 2 --tag_strategy keep_one --cold_steps_count 0 --cold_lr 1e-3 --lr 1e-5 --predictor_dropout 0.0 --lowercase_tokens 0 --pieces_per_token 5 --label_smoothing 0.0 --model_dir ./models --accumulation_size 4 --n_epoch 2 --cold_steps_count 2 --updates_per_epoch 10000 --tn_prob 0 --tp_prob 1 --transformer_model roberta --special_tokens_fix 1
And predicted with
python predict.py --model_path ./models/best.th --vocab_path ./models/vocabulary --input_file ./data/corpus/source.txt --output_file ./data/results/res.txt --iteration_count 10
also tried
python predict.py --model_path ./models/best.th --vocab_path ./models/vocabulary --input_file ./data/corpus/source.txt --output_file ./data/results/res.txt --iteration_count 5 --additional_confidence 0.2 --min_error_probability 0.5
But yielded poorer results
There is still some easy error that the model is not fixing but I should make a longer training with pre-training and stuff (this was just a POC/sanity check) but at least it's better. However, it seems really sensitive to parametrization.
I guess we can close the issue as it is only trying to find the best parameters for my use-case.
Thanks for your help, Have a great day.
@Ierezell have you solve this issue, can you look into mine #142 . thanks
I tried all the steps as said in the readme on a custom dataset.
Preprocessing correctly creates the output file with each line looking like
$STARTSEPL|||SEPR$KEEP JeSEPL|||SEPR$KEEP suisSEPL|||
etc.....with
Then training with (note that my_model_dir was empty at start):
and then testing with :
And got
Produced overall corrections: 0
both files are the same, none are corrected....Thanks in advance for any help Have a great day