ValueError: Must specify max_steps > 0, given: 0

etetteh commented 3 years ago

$python3 electra_small/run_finetuning.py \
--data-dir $DATA_DIR \
--model-name "ELECTRA-small" \
--hparams '{"model_size": "small", "task_names": ["<task_name>"], "num_trials": 5, "learning_rate": 3e-4, "train_batch_size": 16, "use_tpu": "True", "num_tpu_cores": 8, "tpu_name": "<tpu_name>", "tpu_zone": "europe-west4-a", "gcp_project": "<gcp_name>", "vocab_size": 50000, "num_train_epochs": 10}'

I am getting the following error. Is there something I am missing?

Training for 0 steps
ERROR:tensorflow:Error recorded from training_loop: Must specify max_steps > 0, given: 0
Traceback (most recent call last):
  File "electra_small/run_finetuning.py", line 323, in <module>
    main()
  File "electra_small/run_finetuning.py", line 319, in main
    args.model_name, args.data_dir, **hparams))
  File "electra_small/run_finetuning.py", line 270, in run_finetuning
    model_runner.train()
  File "electra_small/run_finetuning.py", line 183, in train
    input_fn=self._train_input_fn, max_steps=self.train_steps)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
    rendezvous.raise_errors()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
    six.reraise(typ, value, traceback)
  File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
    'Must specify max_steps > 0, given: {}'.format(max_steps))
ValueError: Must specify max_steps > 0, given: 0

stefan-it commented 3 years ago

Is there any task_name specified in the hparams section :thinking:

Here's the supported list of task names:

https://github.com/google-research/electra/blob/81f7e5fc98b0ad8bfd20b641aa8bc9e6ac00c8eb/finetune/task_builder.py#L36-L70

Additionally, it is important that the model path is correct.

The final model path is built here:

https://github.com/google-research/electra/blob/f93f3f81cdc13435dd3e85766852d00ff3e00ab5/configure_finetuning.py#L104

It then would be $DATA_DIR/models/ELECTRA-small.

Hope this helps :)

etetteh commented 3 years ago

Yes, I have created my own tasks, and specified it in the hparams call.

I think the problem is the model dir. The tree of my storage looks like this Bucket name --finetuning_dir --pretraining_dir --models_dir --pretrain_tfrecords --pretrain_tfrecords-data.0 --pretrain_tfrecords-data.n --vocab.txt

I think the fine-tuning dir should move into the pretraining dir. Also the config_fineruning.py script creates a new fine-tuning dir, which I hope will be overwritten by specifying an explicit location

On Fri, Nov 20, 2020, 05:07 Stefan Schweter notifications@github.com wrote:

Is there any task_name specified in the hparams section 🤔

Here's the supported list of task names:

https://github.com/google-research/electra/blob/81f7e5fc98b0ad8bfd20b641aa8bc9e6ac00c8eb/finetune/task_builder.py#L36-L70

Additionally, it is important that the model path is correct.

The final model path is built here:

https://github.com/google-research/electra/blob/f93f3f81cdc13435dd3e85766852d00ff3e00ab5/configure_finetuning.py#L104

It then would be $DATA_DIR/models/ELECTRA-small.

Hope this helps :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stefan-it/turkish-bert/issues/21#issuecomment-731074644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGZQ72CJR5PKFOI6DHHD2MLSQY5XHANCNFSM4T4NDVFQ .

etetteh commented 3 years ago

still getting the same error

:~$ python3 electra_small/run_finetuning.py --data-dir "gs://my_bucket/pretraining_data/finetuning_data/BC5CDR-chem/" --model-name "covidECTRA-small" --hparams '{"model_size": "small", "task_names": ["bc5c"], "num_trials": 5,"learning_rate": 3e-4, "train_batch_size": 16, "use_tpu": "True", "num_tpu_cores": 8, "tpu_name": "my_tpu", "tpu_zone": "europe-west4-a", "gcp_project": "my_p", "vocab_size": 50000, "num_train_epochs": 60, "model_dir":"gs://my_bucket/pretraining_data/models/covidECTRA-Small/", "vocab_file":"gs://my_bucket/pretraining_data/vocab.txt"}'

================================Config: model=ELECTRA-small, trial 1/5==============================
answerable_classifier True
answerable_uses_start_logits True
answerable_weight 0.5
beam_size 20
data_dir gs://my_bucket/pretraining_data/finetuning_data/BC5CDR-chem/
debug False
do_eval True
do_lower_case True
do_train True
doc_stride 128
double_unordered True
embedding_size None
eval_batch_size 32
gcp_project covidectra
init_checkpoint gs://my_bucket/pretraining_data/finetuning_data/BC5CDR-chem/models/ELECTRA-small
iterations_per_loop 1000
joint_prediction True
keep_all_models True
layerwise_lr_decay 0.8
learning_rate 0.0003
log_examples False
max_answer_length 30
max_query_length 64
max_seq_length 128
model_dir gs://my_bucket/pretraining_data/models/ELECTRA-Small/
model_hparam_overrides {}
model_name ELECTRA-small
model_size small
n_best_size 20
n_writes_test 5
num_tpu_cores 8
num_train_epochs 60
num_trials 5
predict_batch_size 32
preprocessed_data_dir gs://my_bucket/pretraining_data/finetuning_data/BC5CDR-chem/models/covidECTRA-small/finetuning_tfrecords/bc5c_tfrecords
qa_eval_file <built-in method format of str object at 0x7fa57ed85030>
qa_na_file <built-in method format of str object at 0x7fa57edf03a0>
qa_na_threshold -2.75
qa_preds_file <built-in method format of str object at 0x7fa57ed850d8>
raw_data_dir <built-in method format of str object at 0x7fa57dba8e00>
results_pkl gs://my_bucket/pretraining_data/finetuning_data/BC5CDR-chem/models/ELECTRA-small/results/bc5c_results.pkl
results_txt gs://my_bucket/pretraining_data/finetuning_data/BC5CDR-chem/models/ELECTRA-small/results/bc5c_results.txt
save_checkpoints_steps 1000000
task_names ['bc5c']
test_predictions <built-in method format of str object at 0x7fa57dba41a0>
tpu_job_name None
tpu_name my_tpu
tpu_zone europe-west4-a
train_batch_size 16
use_tfrecords_if_existing True
use_tpu True
vocab_file gs://my_bucket/pretraining_data/vocab.txt
vocab_size 50000
warmup_proportion 0.1
weight_decay_rate 0.01
write_test_outputs True

Loading dataset bc5c_train
================================================================================
Start training: model=ELECTRA-small, trial 1/5
================================================================================
Training for 0 steps
ERROR:tensorflow:Error recorded from training_loop: Must specify max_steps > 0, given: 0
Traceback (most recent call last):
  File "electra_small/run_finetuning.py", line 323, in <module>
    main()
  File "electra_small/run_finetuning.py", line 319, in main
    args.model_name, args.data_dir, **hparams))
  File "electra_small/run_finetuning.py", line 270, in run_finetuning
    model_runner.train()
  File "electra_small/run_finetuning.py", line 183, in train
    input_fn=self._train_input_fn, max_steps=self.train_steps)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
    rendezvous.raise_errors()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
    six.reraise(typ, value, traceback)
  File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
    'Must specify max_steps > 0, given: {}'.format(max_steps))
ValueError: Must specify max_steps > 0, given: 0

When I check the GCS: gs://my_bucket/pretraining_data/finetuning_data/BC5CDR-chem/models/ELECTRA-small/finetuning_tfrecords/bc5c_tfrecords

I found out the tfrecord file is not created. It's file size is 0 B

stefan-it commented 3 years ago

Could you delete the finetuning_tfrecords folder an re-run the finetuning process.

I think max_steps is 0, because no training tfrecords are written - this should happen in this procedure:

https://github.com/google-research/electra/blob/81f7e5fc98b0ad8bfd20b641aa8bc9e6ac00c8eb/finetune/preprocessing.py#L73-L84

stefan-it commented 3 years ago

If debugging takes too long, I highly recommend to use Transformers library ;)

You just need to convert the ELECTRA checkpoint and then you can fine-tune models on downstream tasks in a more convenient way!

I did that with the Turkish BERT and ELECTRA models :hugs:

etetteh commented 3 years ago

Thank you.

stefan-it / turkish-bert

ValueError: Must specify max_steps > 0, given: 0 #21