Closed GuillemGSubies closed 3 years ago
Looks like the trainer does not like when it gets a None, so when we train from scratch, there is a None in this if and crashes:
None
if
https://github.com/huggingface/transformers/blob/a6cf9ca00b74a8b2244421a6101b83d8cf43cd6b/examples/language-modeling/run_mlm.py#L357
I solved it by deleting that line, but I guess it could affect to other use cases.
To reproduce, call run_mlm this way (I guess it is easier to reproduce, but this might be enough):
run_mlm
python run_mlm.py \ --model_type bert \ --train_file ./data/oscar_1000.txt \ --validation_file ./data/oscar_1000_valid.txt \ --output_dir testing_model \ --tokenizer_name bert-base-spanish-wwm-cased \ --overwrite_output_dir \ --do_train \ --do_eval \ --evaluation_strategy steps \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 16 \ --max_steps 500 \ --save_steps 2000 \ --save_total_limit 15 \ --overwrite_cache \ --max_seq_length 512 \ --eval_accumulation_steps 10 \ --logging_steps 1000 \
The dataset I'm using I guess that isn't relevant so any corpus will do.
@sgugger
Mmm, that is weird as None is the default for that argument. Will investigate this when I'm finished with v4 stuff, thanks for flagging!
Looks like the trainer does not like when it gets a
None
, so when we train from scratch, there is aNone
in thisif
and crashes:https://github.com/huggingface/transformers/blob/a6cf9ca00b74a8b2244421a6101b83d8cf43cd6b/examples/language-modeling/run_mlm.py#L357
I solved it by deleting that line, but I guess it could affect to other use cases.
To reproduce, call
run_mlm
this way (I guess it is easier to reproduce, but this might be enough):The dataset I'm using I guess that isn't relevant so any corpus will do.
@sgugger