Cannot train model from scratch using `run_mlm.py`.

Looks like the trainer does not like when it gets a None, so when we train from scratch, there is a None in this if and crashes:

https://github.com/huggingface/transformers/blob/a6cf9ca00b74a8b2244421a6101b83d8cf43cd6b/examples/language-modeling/run_mlm.py#L357

I solved it by deleting that line, but I guess it could affect to other use cases.

To reproduce, call run_mlm this way (I guess it is easier to reproduce, but this might be enough):

python  run_mlm.py \
    --model_type bert \
    --train_file ./data/oscar_1000.txt \
    --validation_file ./data/oscar_1000_valid.txt \
    --output_dir testing_model \
    --tokenizer_name bert-base-spanish-wwm-cased  \
    --overwrite_output_dir \
    --do_train \
    --do_eval \
    --evaluation_strategy steps \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 16 \
    --max_steps 500 \
    --save_steps 2000 \
    --save_total_limit 15 \
    --overwrite_cache \
    --max_seq_length 512 \
    --eval_accumulation_steps 10 \
    --logging_steps 1000 \

The dataset I'm using I guess that isn't relevant so any corpus will do.

@sgugger

huggingface / transformers

Cannot train model from scratch using `run_mlm.py`. #8590