Closed acmilannesta closed 4 years ago
Hi @acmilannesta, do_predict added.
Great, thx so much!
But do you also need to add something like prediction data output in "create_finetuning_data.py"?
@acmilannesta for running predict, additional flags
--do_predict --predict_data_path=${OUTPUT_DIR}/${TASK_NAME}_predict.tf_record --input_data_dir=${GLUE_DIR}/
Full cmd
python run_classifer.py --train_data_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record --eval_data_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record --input_meta_data_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data --albert_config_file=${ALBERT_DIR}/config.json --task_name=${TASK_NAME} --spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model --output_dir=${MODEL_DIR} --init_checkpoint=${ALBERT_DIR}/tf2_model.h5 --do_train --do_eval --train_batch_size=128 --learning_rate=1e-5 --custom_training_loop --do_predict --predict_data_path=${OUTPUT_DIR}/${TASK_NAME}_predict.tf_record --input_data_dir=${GLUE_DIR}/
Epoch 1/3 7698/7698 [==============================] - 9878s 1s/step - loss: 0.8664 - accuracy: 0.7629 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 Epoch 2/3 2445/1925 [======================================] - 2886s 1s/step - loss: 0.5667 - accuracy: 0.8351
Got it. Now I got into another issue. It seems the 2nd training epoch will use my validation set. My training set has 7698 batches, and my validation set has 1925 batches.
Second epoch won't use validation dataset as training set, even though it seems like that. Model.fit api has known issues when tf.data is given as input , steps calculation issues. Only after first epoch model will know the complete data size . Ignore it.
You can also train using --custom_training_loop
Thank you! I've just run my model on AWS g4dn.xlarge (Tesla T4 GPU) VM. However, it almost took 3 times longer than running bert-base model. Here is my code:
os.system(
'/home/ubuntu/py3env/bin/python3.6 ALBERT-TF2.0/run_classifer.py \
--train_data_path=input/COLA_train.tf_record \
--eval_data_path=input/COLA_eval.tf_record \
--predict_data_path= input/COLA_predict.tf_record \
--input_data_dir=input/CoLA \
--input_meta_data_path=input/COLA_meta_data \
--albert_config_file=base_2/config.json \
--task_name=CoLA \
--spm_model_file=base_2/vocab/30k-clean.model \
--output_dir=output \
--init_checkpoint=base_2/tf2_model.h5 \
--do_train \
--do_eval \
--do_predict \
--train_batch_size=16 \
--eval_batch_size=16 \
--learning_rate=5e-5 \
--max_seq_length=142 \
--num_train_epochs=3 \
--custom_training_loop'
)
@acmilannesta How much time for bert and Albert ?
About 170-175 ms/step on bert 480-490 ms/step on albert
Kamal Raj notifications@github.com 于2019年11月11日周一 上午8:10写道:
@acmilannesta https://github.com/acmilannesta How much time for bert and Albert ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kamalkraj/ALBERT-TF2.0/issues/3?email_source=notifications&email_token=ALL6NUWY7UFIKTTK6X4CWV3QTFKTTA5CNFSM4JJMRWA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDWY3FA#issuecomment-552439188, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALL6NUWUQN4MF2UYGU5MKELQTFKTTANCNFSM4JJMRWAQ .
Which implementation of bert are you using ? Is batch size same in bert and albert training ?
I used a package called keras_bert (https://github.com/CyberZHG/keras-bert), whcih basically load ckpt into keras model.
Yes, the batch size is same = 16.
TensorFlow version for keras_bert ?
Yes.
Tensorflow 1.X or 2.0 ?
I think both work. But I'm running on 1.x
Tensorflow 2 is slower than 1.x https://github.com/tensorflow/tensorflow/issues/33487
https://github.com/tensorflow/models/tree/master/official/nlp/bert TF 2.0 implementation of BERT
I see. Let me try this https://github.com/google-research/google-research/tree/master/albert It seems work on tensorflow 1.x
The above one runs on TensorFlow 1.15
I just tried. About 2.4 steps/second (416 ms/step).
https://github.com/tensorflow/tensorflow/issues/33487#issuecomment-543652394
Based on this comment, would you suggest turning off eager in tf2.0 to run your codes?
@acmilannesta Try and let me know the results
Unfortunately, it doesn't allow for disabling eager in tf2.0.
Kamal Raj notifications@github.com 于2019年11月11日周一 下午1:01写道:
@acmilannesta https://github.com/acmilannesta Try and let me know the results
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kamalkraj/ALBERT-TF2.0/issues/3?email_source=notifications&email_token=ALL6NUSJ64UUGRPMVMBHOCTQTGMYFA5CNFSM4JJMRWA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDXTVQI#issuecomment-552549057, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALL6NUSGQ6ODPQ6NIJQGXZLQTGMYFANCNFSM4JJMRWAQ .
Hi,
Can the script do predict? I may miss it, but I didn't see a "do_pred" flag.