pohanchi / huggingface_albert

hugginface albert model and its tokenizer
Apache License 2.0
15 stars 1 forks source link

Nice Work! Ideas to improve Performance #1

Open ahotrod opened 4 years ago

ahotrod commented 4 years ago

❓ Questions & Help

Good job, thanks for sharing your code.

My system has 2 x NVIDIA 1080Ti. Running data parallel doesn't work for me with the current transformers, and I'd prefer to run distributed processing using 2 x 1080Ti, but transformers currently has a limitation in that it doesn't do_eval with distributed processing. So with transformers I'm forced to do a 2-step script which separates distributed 2x GPU training and single GPU evaluation:

#!/bin/bash

MODEL_PATH=/media/dn/dssd/nlp/huggingface_albert/examples/runs/albert_base_sqd1_dist
SQUAD_DIR=/media/dn/dssd/nlp/transformers/examples/scripts/squad1.1

python -m torch.distributed.launch --nproc_per_node=2 run_squad_albert.py \
  --train_file ${SQUAD_DIR}/train-v1.1.json \
  --predict_file ${SQUAD_DIR}/dev-v1.1.json \
  --model_type albert \
  --output_dir ${MODEL_PATH} \
  --do_train \
  --overwrite_output_dir \
  --do_lower_case \
  --warmup_steps 384 \
  --save_steps 1000 \
  --logging_steps 1000 \
  --num_train_epochs 3

CUDA_VISIBLE_DEVICES=0 python run_squad_albert.py \
  --model_type albert \
  --model_name_or_path ${MODEL_PATH} \
  --do_eval \
  --do_lower_case \
  --train_file ${SQUAD_DIR}/train-v1.1.json \
  --predict_file ${SQUAD_DIR}/dev-v1.1.json \
  --max_seq_length 384 \
  --output_dir ${MODEL_PATH}
$@

However, your run_squad_albert.py doesn't include --model_name_or_path functionality. Is there another way to run --do_eval on a previously fine-tuned model in run_squad_albert.py?

Thanks again, looking forward to eventually get ALBERT-xlarge model fine-tuned on SQuAD 2.0.

pohanchi commented 4 years ago

Hi, the performance I show didn’t modify anything and just train for that command and wait 3 epoch then u will get the same result.

Also, to evaluate previously model, just need to modify code part which load model to became Albert from pretrained and argument set to the directory name where u save your Albert model and that will work

On Sun, Nov 3, 2019 at 08:51 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

On Sun, Nov 3, 2019 at 03:55 ahotrod notifications@github.com wrote:

❓ Questions & Help

Good job, thanks for sharing your code.

My system has 2 x NVIDIA 1080Ti. To replicate your results I ran your ALBERT-base script on 1 x 1080Ti with the results shown below:

!/bin/bash

MODEL_PATH=/media/dn/dssd/nlp/huggingface_albert/examples/runs/albert_base_sqd1_def

SQUAD_DIR=/media/dn/dssd/nlp/transformers/examples/scripts/squad1.1

CUDA_VISIBLE_DEVICES=0 python run_squad_albert.py \

--train_file ${SQUAD_DIR}/train-v1.1.json \

--predict_file ${SQUAD_DIR}/dev-v1.1.json \

--model_type albert \

--output_dir ${MODEL_PATH} \

--do_train \

--do_eval \

--overwrite_output_dir \

--do_lower_case \

--evaluate_during_training \

--warmup_steps 384 \

--save_steps 1000 \

--logging_steps 1000 \

--num_train_epochs 3

$@

Running training

Num examples = 95732

Num Epochs = 3

Instantaneous batch size per GPU = 6

Total train batch size (w. parallel, distributed & accumulation) = 48

Gradient Accumulation steps = 8

Total optimization steps = 5982

{

"exact": 65.0236518448439,

"f1": 76.43417713133167,

"total": 10570,

"HasAns_exact": 65.0236518448439,

"HasAns_f1": 76.43417713133167,

"HasAns_total": 10570

}

Any ideas to improve performance to match your results?

Running data parallel doesn't work for me with the current transformers, and I'd prefer to run distributed processing using 2 x 1080Ti, but transformers currently has a limitation in that it doesn't do_eval with distributed processing. So with transformers I'm forced to do a 2-step script which separates distributed 2x GPU training and single GPU evaluation:

!/bin/bash

MODEL_PATH=/media/dn/dssd/nlp/huggingface_albert/examples/runs/albert_base_sqd1_dist

SQUAD_DIR=/media/dn/dssd/nlp/transformers/examples/scripts/squad1.1

python -m torch.distributed.launch --nproc_per_node=2 run_squad_albert.py \

--train_file ${SQUAD_DIR}/train-v1.1.json \

--predict_file ${SQUAD_DIR}/dev-v1.1.json \

--model_type albert \

--output_dir ${MODEL_PATH} \

--do_train \

--overwrite_output_dir \

--do_lower_case \

--warmup_steps 384 \

--save_steps 1000 \

--logging_steps 1000 \

--num_train_epochs 3

CUDA_VISIBLE_DEVICES=0 python run_squad_albert.py \

--model_type albert \

--model_name_or_path ${MODEL_PATH} \

--do_eval \

--do_lower_case \

--train_file ${SQUAD_DIR}/train-v1.1.json \

--predict_file ${SQUAD_DIR}/dev-v1.1.json \

--max_seq_length 384 \

--output_dir ${MODEL_PATH}

$@

However, your run_squad_albert.py doesn't include --model_name_or_path functionality. Is there another way to run --do_eval on a previously fine-tuned model in run_squad_albert.py?

Thanks again, looking forward to eventually get ALBERT-xlarge model fine-tuned on SQuAD 2.0.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4FUXKBH3VLPBJ3B2V3QRXLMHA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWMIPCA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4AXA2QSAGAM5ZCHQU3QRXLMHANCNFSM4JIHYN2A .

pohanchi commented 4 years ago

By the way, The environment I use is one 1060 6G to produce this performance.

On Sun, Nov 3, 2019 at 08:55 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

Hi, the performance I show didn’t modify anything and just train for that command and wait 3 epoch then u will get the same result.

Also, to evaluate previously model, just need to modify code part which load model to became Albert from pretrained and argument set to the directory name where u save your Albert model and that will work

On Sun, Nov 3, 2019 at 08:51 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

On Sun, Nov 3, 2019 at 03:55 ahotrod notifications@github.com wrote:

❓ Questions & Help

Good job, thanks for sharing your code.

My system has 2 x NVIDIA 1080Ti. To replicate your results I ran your ALBERT-base script on 1 x 1080Ti with the results shown below:

!/bin/bash

MODEL_PATH=/media/dn/dssd/nlp/huggingface_albert/examples/runs/albert_base_sqd1_def

SQUAD_DIR=/media/dn/dssd/nlp/transformers/examples/scripts/squad1.1

CUDA_VISIBLE_DEVICES=0 python run_squad_albert.py \

--train_file ${SQUAD_DIR}/train-v1.1.json \

--predict_file ${SQUAD_DIR}/dev-v1.1.json \

--model_type albert \

--output_dir ${MODEL_PATH} \

--do_train \

--do_eval \

--overwrite_output_dir \

--do_lower_case \

--evaluate_during_training \

--warmup_steps 384 \

--save_steps 1000 \

--logging_steps 1000 \

--num_train_epochs 3

$@

Running training

Num examples = 95732

Num Epochs = 3

Instantaneous batch size per GPU = 6

Total train batch size (w. parallel, distributed & accumulation) = 48

Gradient Accumulation steps = 8

Total optimization steps = 5982

{

"exact": 65.0236518448439,

"f1": 76.43417713133167,

"total": 10570,

"HasAns_exact": 65.0236518448439,

"HasAns_f1": 76.43417713133167,

"HasAns_total": 10570

}

Any ideas to improve performance to match your results?

Running data parallel doesn't work for me with the current transformers, and I'd prefer to run distributed processing using 2 x 1080Ti, but transformers currently has a limitation in that it doesn't do_eval with distributed processing. So with transformers I'm forced to do a 2-step script which separates distributed 2x GPU training and single GPU evaluation:

!/bin/bash

MODEL_PATH=/media/dn/dssd/nlp/huggingface_albert/examples/runs/albert_base_sqd1_dist

SQUAD_DIR=/media/dn/dssd/nlp/transformers/examples/scripts/squad1.1

python -m torch.distributed.launch --nproc_per_node=2 run_squad_albert.py \

--train_file ${SQUAD_DIR}/train-v1.1.json \

--predict_file ${SQUAD_DIR}/dev-v1.1.json \

--model_type albert \

--output_dir ${MODEL_PATH} \

--do_train \

--overwrite_output_dir \

--do_lower_case \

--warmup_steps 384 \

--save_steps 1000 \

--logging_steps 1000 \

--num_train_epochs 3

CUDA_VISIBLE_DEVICES=0 python run_squad_albert.py \

--model_type albert \

--model_name_or_path ${MODEL_PATH} \

--do_eval \

--do_lower_case \

--train_file ${SQUAD_DIR}/train-v1.1.json \

--predict_file ${SQUAD_DIR}/dev-v1.1.json \

--max_seq_length 384 \

--output_dir ${MODEL_PATH}

$@

However, your run_squad_albert.py doesn't include --model_name_or_path functionality. Is there another way to run --do_eval on a previously fine-tuned model in run_squad_albert.py?

Thanks again, looking forward to eventually get ALBERT-xlarge model fine-tuned on SQuAD 2.0.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4FUXKBH3VLPBJ3B2V3QRXLMHA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWMIPCA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4AXA2QSAGAM5ZCHQU3QRXLMHANCNFSM4JIHYN2A .

pohanchi commented 4 years ago

The argument I use for that performance

On Sun, Nov 3, 2019 at 08:56 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

By the way, The environment I use is one 1060 6G to produce this performance.

On Sun, Nov 3, 2019 at 08:55 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

Hi, the performance I show didn’t modify anything and just train for that command and wait 3 epoch then u will get the same result.

Also, to evaluate previously model, just need to modify code part which load model to became Albert from pretrained and argument set to the directory name where u save your Albert model and that will work

On Sun, Nov 3, 2019 at 08:51 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

On Sun, Nov 3, 2019 at 03:55 ahotrod notifications@github.com wrote:

❓ Questions & Help

Good job, thanks for sharing your code.

My system has 2 x NVIDIA 1080Ti. To replicate your results I ran your ALBERT-base script on 1 x 1080Ti with the results shown below:

!/bin/bash

MODEL_PATH=/media/dn/dssd/nlp/huggingface_albert/examples/runs/albert_base_sqd1_def

SQUAD_DIR=/media/dn/dssd/nlp/transformers/examples/scripts/squad1.1

CUDA_VISIBLE_DEVICES=0 python run_squad_albert.py \

--train_file ${SQUAD_DIR}/train-v1.1.json \

--predict_file ${SQUAD_DIR}/dev-v1.1.json \

--model_type albert \

--output_dir ${MODEL_PATH} \

--do_train \

--do_eval \

--overwrite_output_dir \

--do_lower_case \

--evaluate_during_training \

--warmup_steps 384 \

--save_steps 1000 \

--logging_steps 1000 \

--num_train_epochs 3

$@

Running training

Num examples = 95732

Num Epochs = 3

Instantaneous batch size per GPU = 6

Total train batch size (w. parallel, distributed & accumulation) = 48

Gradient Accumulation steps = 8

Total optimization steps = 5982

{

"exact": 65.0236518448439,

"f1": 76.43417713133167,

"total": 10570,

"HasAns_exact": 65.0236518448439,

"HasAns_f1": 76.43417713133167,

"HasAns_total": 10570

}

Any ideas to improve performance to match your results?

Running data parallel doesn't work for me with the current transformers, and I'd prefer to run distributed processing using 2 x 1080Ti, but transformers currently has a limitation in that it doesn't do_eval with distributed processing. So with transformers I'm forced to do a 2-step script which separates distributed 2x GPU training and single GPU evaluation:

!/bin/bash

MODEL_PATH=/media/dn/dssd/nlp/huggingface_albert/examples/runs/albert_base_sqd1_dist

SQUAD_DIR=/media/dn/dssd/nlp/transformers/examples/scripts/squad1.1

python -m torch.distributed.launch --nproc_per_node=2 run_squad_albert.py \

--train_file ${SQUAD_DIR}/train-v1.1.json \

--predict_file ${SQUAD_DIR}/dev-v1.1.json \

--model_type albert \

--output_dir ${MODEL_PATH} \

--do_train \

--overwrite_output_dir \

--do_lower_case \

--warmup_steps 384 \

--save_steps 1000 \

--logging_steps 1000 \

--num_train_epochs 3

CUDA_VISIBLE_DEVICES=0 python run_squad_albert.py \

--model_type albert \

--model_name_or_path ${MODEL_PATH} \

--do_eval \

--do_lower_case \

--train_file ${SQUAD_DIR}/train-v1.1.json \

--predict_file ${SQUAD_DIR}/dev-v1.1.json \

--max_seq_length 384 \

--output_dir ${MODEL_PATH}

$@

However, your run_squad_albert.py doesn't include --model_name_or_path functionality. Is there another way to run --do_eval on a previously fine-tuned model in run_squad_albert.py?

Thanks again, looking forward to eventually get ALBERT-xlarge model fine-tuned on SQuAD 2.0.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4FUXKBH3VLPBJ3B2V3QRXLMHA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWMIPCA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4AXA2QSAGAM5ZCHQU3QRXLMHANCNFSM4JIHYN2A .

ahotrod commented 4 years ago

Really good, thanks! Your results running on a NVIDIA 1060 are impressive. Got a 2 x 1080Ti distributed processing script working:

#!/bin/bash

MODEL_PATH=/media/dn/dssd/nlp/huggingface_albert/examples/runs/albert_base_sqd1_dist_def
SQUAD_DIR=/media/dn/dssd/nlp/transformers/examples/scripts/squad1.1

python -m torch.distributed.launch --nproc_per_node=2 run_squad_albert.py \
  --train_file ${SQUAD_DIR}/train-v1.1.json \
  --predict_file ${SQUAD_DIR}/dev-v1.1.json \
  --model_type albert \
  --output_dir ${MODEL_PATH} \
  --do_train \
  --overwrite_output_dir \
  --do_lower_case \
  --warmup_steps 384 \
  --save_steps 1000 \
  --logging_steps 1000 \
  --num_train_epochs 3

CUDA_VISIBLE_DEVICES=0 python run_squad_albert.py \
  --train_file ${SQUAD_DIR}/train-v1.1.json \
  --predict_file ${SQUAD_DIR}/dev-v1.1.json \
  --model_type albert \
  --output_dir ${MODEL_PATH} \
  --do_eval \
  --do_lower_case \
  --config_name ${MODEL_PATH}
$@

***** Running training *****
Num examples = 95732
Num Epochs = 3
Instantaneous batch size per GPU = 6
Total train batch size (w. parallel, distributed & accumulation) = 96
Gradient Accumulation steps = 8
Total optimization steps = 2991

{
  "exact": 78.43897824030275,
  "f1": 86.57458158471398,
  "total": 10570,
  "HasAns_exact": 78.43897824030275,
  "HasAns_f1": 86.57458158471398,
  "HasAns_total": 10570
}

~33 minutes an epoch, base model occupying ~5.8 GB of 11.2 GB on each GPU. Now to attempt ALBERT-xlarge fine-tuned on SQuAD 2.0.

pohanchi commented 4 years ago

You need to change the pretrain model

On Mon, Nov 4, 2019 at 00:47 ahotrod notifications@github.com wrote:

The pytorch_albert_large 68MB & pytorch_albert_xlarge 225MB pytorch_state_dict downloads from google are not working. I tried with both Linux & Win10 OSs, both hang. The pytorch_albert (base) 45MB download works.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4GHTXSZVX4YZYSZROTQR3W7JA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5XKUY#issuecomment-549156179, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4FALM4NOTY4ZIJ2GP3QR3W7JANCNFSM4JIHYN2A .

pohanchi commented 4 years ago

Config pretrain also need to using

On Mon, Nov 4, 2019 at 00:48 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

You need to change the pretrain model

On Mon, Nov 4, 2019 at 00:47 ahotrod notifications@github.com wrote:

The pytorch_albert_large 68MB & pytorch_albert_xlarge 225MB pytorch_state_dict downloads from google are not working. I tried with both Linux & Win10 OSs, both hang. The pytorch_albert (base) 45MB download works.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4GHTXSZVX4YZYSZROTQR3W7JA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5XKUY#issuecomment-549156179, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4FALM4NOTY4ZIJ2GP3QR3W7JANCNFSM4JIHYN2A .

pohanchi commented 4 years ago

Can u show your error message ?

On Mon, Nov 4, 2019 at 00:54 ahotrod notifications@github.com wrote:

Understand I need to change the pretrained models, to either the large or xlarge pretrained models. However, downloading the large & xlarge pretrained models from google is not working. Only the base model downloads.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4GJZWLMEP4ZODDMZ4TQR3XZRA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5XPRA#issuecomment-549156804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4ADFWVZRD4DEXMPT73QR3XZRANCNFSM4JIHYN2A .

ahotrod commented 4 years ago

There is no error message. I have albert-large running now, ~53 minutes per epoch. Will post my results.

I was just wondering if the pre-trained models at your download link are Version 1 or Version 2 from: https://github.com/google-research/google-research/tree/master/albert

New October 31, 2019 Version 2 of ALBERT models is relased. TF-Hub modules are available: https://tfhub.dev/google/albert_base/2 https://tfhub.dev/google/albert_large/2 https://tfhub.dev/google/albert_xlarge/2 https://tfhub.dev/google/albert_xxlarge/2

Thanks so much, so far everything is working great!

pohanchi commented 4 years ago

Oh that’s very easy things for Version2.0, but I need to do that tomorrow morning

On Mon, Nov 4, 2019 at 02:01 ahotrod notifications@github.com wrote:

There is no error message. I have albert-large running now, ~53 minutes per epoch. Will post my results.

I was just wondering if the pre-trained models at your download link are Version 1 or Version 2 from: https://github.com/google-research/google-research/tree/master/albert http://url

New October 31, 2019 Version 2 of ALBERT models is relased. TF-Hub modules are available: https://tfhub.dev/google/albert_base/2 https://tfhub.dev/google/albert_large/2 https://tfhub.dev/google/albert_xlarge/2 https://tfhub.dev/google/albert_xxlarge/2

Thanks so much, so far everything is working great!

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4CBYP3XSOKCRDGWUUTQR37XJA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5Y42Y#issuecomment-549162603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4G55LD7RPU2SXSL5WLQR37XJANCNFSM4JIHYN2A .

pohanchi commented 4 years ago

Oh for the drive they are all version 1. But version 2 I will add in the future

On Mon, Nov 4, 2019 at 02:03 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

Oh that’s very easy things for Version2.0, but I need to do that tomorrow morning

On Mon, Nov 4, 2019 at 02:01 ahotrod notifications@github.com wrote:

There is no error message. I have albert-large running now, ~53 minutes per epoch. Will post my results.

I was just wondering if the pre-trained models at your download link are Version 1 or Version 2 from: https://github.com/google-research/google-research/tree/master/albert http://url

New October 31, 2019 Version 2 of ALBERT models is relased. TF-Hub modules are available: https://tfhub.dev/google/albert_base/2 https://tfhub.dev/google/albert_large/2 https://tfhub.dev/google/albert_xlarge/2 https://tfhub.dev/google/albert_xxlarge/2

Thanks so much, so far everything is working great!

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4CBYP3XSOKCRDGWUUTQR37XJA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5Y42Y#issuecomment-549162603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4G55LD7RPU2SXSL5WLQR37XJANCNFSM4JIHYN2A .

ahotrod commented 4 years ago

Thanks again for your excellent work on this.

Overnight, albert_xlarge_v1 model running ~5.5 hours per epoch, occupying ~11.1 GB of 11.2 GB on each GPU, 2 x 1080Ti: { "exact": 82.22327341532639, "f1": 90.01828208164447, "total": 10570, "HasAns_exact": 82.22327341532639, "HasAns_f1": 90.01828208164447, "HasAns_total": 10570 } I am running Version 2 models today. The new albert_config_l2.json file has "num_hidden_layers": 24, should it be "num_hidden_layers": 12?

pohanchi commented 4 years ago

For the google research, the config they apply on large model is 24 layer, so for the origin version 1 on large model you need to change its layer number to 24.

On Tue, Nov 5, 2019 at 03:27 ahotrod notifications@github.com wrote:

Thanks again for your excellent work on this.

I am running Version 2 models today. The new albert_config_l2.json file has "num_hidden_layers": 24, should it be "num_hidden_layers": 12?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4DYS57XKLCYPUA6Q3LQSBZTFA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDANTDA#issuecomment-549509516, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4AWFIPEQRPT5FOJC63QSBZTFANCNFSM4JIHYN2A .

ahotrod commented 4 years ago

For the google research, the config they apply on large model is 24 layer, so for the origin version 1 on large model you need to change its layer number to 24. …

OK, reason I asked, my script hyperparameters for V1 large model wouldn't run V2 large model, spawning GPU OOM failures. Indeed the original paper does use 24 layer for large model. All is well!

pohanchi commented 4 years ago

Your layer number also let your calculation and gpu memory increase, so 24 layer will cause OOM

On Tue, Nov 5, 2019 at 09:05 ahotrod notifications@github.com wrote:

For the google research, the config they apply on large model is 24 layer, so for the origin version 1 on large model you need to change its layer number to 24. …

OK, reason I asked, my script hyperparameters for V1 large model wouldn't run V2 large model, spawning GPU OOM failures. Indeed the original paper does use 24 layer for large model. All is well!

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4B2MFOFTT63NRRSW2TQSDBHBA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDBH7EA#issuecomment-549617552, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4CTGU55ZRNZI5WZ7ADQSDBHBANCNFSM4JIHYN2A .

ahotrod commented 4 years ago

albert xlarge Version 1 SQuAD 1.1: { "exact": 82.22327341532639, "f1": 90.01828208164447, "total": 10570, "HasAns_exact": 82.22327341532639, "HasAns_f1": 90.01828208164447, "HasAns_total": 10570 } albert xlarge Version 2 SQuAD 1.1: { "exact": 0.3500473036896878, "f1": 7.691640650211578, "total": 10570, "HasAns_exact": 0.3500473036896878, "HasAns_f1": 7.691640650211578, "HasAns_total": 10570 } Changed the Version 1 script to run the Version 2 model adding:

--model_dict_pretrain ${HFA_PATH}/pytorch_model_state_dict/pytorch_albert_xl2 \
--config_pretrain ${HFA_PATH}/config/albert_config_xl2.json \

Everything else in the script stayed the same.

Ref: https://github.com/huggingface/transformers/pull/1683#issuecomment-550523247

pohanchi commented 4 years ago

i didn't try version2 but something will fail the em and f1 is because learning rate too big may be you can tune to 1e-5 or 2e-5 and the result will be better

ahotrod commented 4 years ago

Something appears to be amiss with the albert xlarge Version 2 model:

https://github.com/google-research/google-research/issues/119#issue-520113011

No one appears to get good results, so I have been holding-off more fine-tuning Version 2 models until something is resolved.

pohanchi commented 4 years ago

I can double check and test about that after today midterm exam. Hope you find solution soon.

On Sat, Nov 9, 2019 at 13:59 ahotrod notifications@github.com wrote:

Something appears to be amiss with the albert xlarge Version 2 model:

google-research/google-research#119 (comment) https://github.com/google-research/google-research/issues/119#issue-520113011

No one appears to get good results, so I have been holding-off more fine-turning Version 2 models until something is resolved.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4EPEX36NWOADUGQLY3QSZGSFA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDT6VOI#issuecomment-552069817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4CLEDVB2ARKBU2NFULQSZGSFANCNFSM4JIHYN2A .

pohanchi commented 4 years ago

The module I also can check. So after three hour I will do it first

On Sat, Nov 9, 2019 at 14:09 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

I can double check and test about that after today midterm exam. Hope you find solution soon.

On Sat, Nov 9, 2019 at 13:59 ahotrod notifications@github.com wrote:

Something appears to be amiss with the albert xlarge Version 2 model:

google-research/google-research#119 (comment) https://github.com/google-research/google-research/issues/119#issue-520113011

No one appears to get good results, so I have been holding-off more fine-turning Version 2 models until something is resolved.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4EPEX36NWOADUGQLY3QSZGSFA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDT6VOI#issuecomment-552069817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4CLEDVB2ARKBU2NFULQSZGSFANCNFSM4JIHYN2A .

pohanchi commented 4 years ago

@ahotrod hey i try train albert xlarge version 2 on squad 1.1 it didn't have problem and EM : 85.87 F1: 92.58 still for training!!

pohanchi commented 4 years ago

On Sat, Nov 9, 2019 at 14:12 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

The module I also can check. So after three hour I will do it first

On Sat, Nov 9, 2019 at 14:09 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

I can double check and test about that after today midterm exam. Hope you find solution soon.

On Sat, Nov 9, 2019 at 13:59 ahotrod notifications@github.com wrote:

Something appears to be amiss with the albert xlarge Version 2 model:

google-research/google-research#119 (comment) https://github.com/google-research/google-research/issues/119#issue-520113011

No one appears to get good results, so I have been holding-off more fine-turning Version 2 models until something is resolved.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4EPEX36NWOADUGQLY3QSZGSFA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDT6VOI#issuecomment-552069817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4CLEDVB2ARKBU2NFULQSZGSFANCNFSM4JIHYN2A .

pohanchi commented 4 years ago

If u need parameter and other , you can ask me I can forward it to you

On Sun, Nov 10, 2019 at 15:02 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

On Sat, Nov 9, 2019 at 14:12 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

The module I also can check. So after three hour I will do it first

On Sat, Nov 9, 2019 at 14:09 aapp1420@gmail.com.tw aapp1420@gmail.com wrote:

I can double check and test about that after today midterm exam. Hope you find solution soon.

On Sat, Nov 9, 2019 at 13:59 ahotrod notifications@github.com wrote:

Something appears to be amiss with the albert xlarge Version 2 model:

google-research/google-research#119 (comment) https://github.com/google-research/google-research/issues/119#issue-520113011

No one appears to get good results, so I have been holding-off more fine-turning Version 2 models until something is resolved.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4EPEX36NWOADUGQLY3QSZGSFA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDT6VOI#issuecomment-552069817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4CLEDVB2ARKBU2NFULQSZGSFANCNFSM4JIHYN2A .

kamalkraj commented 4 years ago

https://github.com/kamalkraj/ALBERT-TF2.0#runnig-squad-v20

ahotrod commented 4 years ago

If u need parameter and other , you can ask me I can forward it to you

On Sun, Nov 10, 2019 at 15:02 aapp1420@gmail.com.tw aapp1420@gmail.com wrote: …

I'll try another fine-tuning run this week. It would be helpful to see your shell script for your albert xlarge version 2 on squad 1.1. Are you running this on the 6GB 1060 as previously mentioned??

pohanchi commented 4 years ago

I run on 1 gpu 1080Ti batch_size 2 accumulate 24 eat 7G memory

On Mon, Nov 11, 2019 at 00:34 ahotrod notifications@github.com wrote:

If u need parameter and other , you can ask me I can forward it to you

On Sun, Nov 10, 2019 at 15:02 aapp1420@gmail.com.tw aapp1420@gmail.com wrote: …

I'll try another fine-tuning run this week. It would be helpful to see your shell script for your albert xlarge version 2 on squad 1.1. Are you running this on the 6GB 1060 as previously mentioned??

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4B7HFJIPXLCAQ5L4NTQTAZYFA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDVBC2Y#issuecomment-552210795, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4DO7BUC4Z5V7OBIJ23QTAZYFANCNFSM4JIHYN2A .

ahotrod commented 4 years ago

https://github.com/kamalkraj/ALBERT-TF2.0#runnig-squad-v20

@kamalkraj I looked at your work, excellent, thanks.

As the Huggingface Transformer paper describes on page 2, 5th paragraph, I'm a "low-resource user", with only a 2x 1080 Ti(s) system. For me to run the larger ALBERT-TF2.0 models, I would need gradient accumulation steps, and I didn't see that option in your run_squad.py code.

pohanchi commented 4 years ago

On argparser, you will find the parameter with accumulated_step or other parameter called accumulated....

On Mon, Nov 11, 2019 at 00:57 ahotrod notifications@github.com wrote:

https://github.com/kamalkraj/ALBERT-TF2.0#runnig-squad-v20

I looked at your work, excellent, thanks.

As the Huggingface Transformer paper describes on page 2, 5th paragraph, I'm a "low-resource user", with only a 2x 1080 Ti(s) system. For me to run the larger ALBERT-TF2.0 models, I would need gradient accumulation steps, and I didn't see that option in your run_squad.py code.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4AAGXIB4I5PRBFC273QTA4OBA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDVBR2Q#issuecomment-552212714, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4H5HMGA7VOY5GPFBVDQTA4OBANCNFSM4JIHYN2A .

kamalkraj commented 4 years ago

https://github.com/kamalkraj/ALBERT-TF2.0#runnig-squad-v20

I looked at your work, excellent, thanks.

As the Huggingface Transformer paper describes on page 2, 5th paragraph, I'm a "low-resource user", with only a 2x 1080 Ti(s) system. For me to run the larger ALBERT-TF2.0 models, I would need gradient accumulation steps, and I didn't see that option in your run_squad.py code.

I will try adding that option (gradient accumulation). Thanks for the feedback @ahotrod

pohanchi commented 4 years ago

python3 run_squad_albert.py --train_file ../data/train-v1.1.json --predict_file ../data/dev-v1.1.json --model_type albert --output_dir pohan --config_pretrain config/albert_config_xl2.json --model_dict_pretrain ./pytorch_model_state_dict/pytorch_albert_xl2 --do_train --do_eval --do_lower_case --evaluate_during_training --per_gpu_train_batch_size 2 --per_gpu_eval_batch_size 2 --learning_rate 1e-5 --gradient_accumulation_steps 24 --weight_decay 0.1 --adam_epsilon 1e-12 --num_train_epochs 3 --warmup_steps 1000 --logging_steps 1000 --save_steps 1000 --fp16

Kamal Raj notifications@github.com 於 2019年11月11日 週一 上午1:27寫道:

https://github.com/kamalkraj/ALBERT-TF2.0#runnig-squad-v20

I looked at your work, excellent, thanks.

As the Huggingface Transformer paper describes on page 2, 5th paragraph, I'm a "low-resource user", with only a 2x 1080 Ti(s) system. For me to run the larger ALBERT-TF2.0 models, I would need gradient accumulation steps, and I didn't see that option in your run_squad.py code.

I will try adding that option (gradient accumulation). Thanks for the feedback @ahotrod https://github.com/ahotrod

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4CTMH5BAMKXIEPJ5KTQTA763A5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDVCFHY#issuecomment-552215199, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4F3KOAC7DETI7UPHFTQTA763ANCNFSM4JIHYN2A .

ahotrod commented 4 years ago

python3 run_squad_albert.py --train_file ../data/train-v1.1.json --predict_file ../data/dev-v1.1.json --model_type albert --output_dir pohan --config_pretrain config/albert_config_xl2.json --model_dict_pretrain ./pytorch_model_state_dict/pytorch_albert_xl2 --do_train --do_eval --do_lower_case --evaluate_during_training --per_gpu_train_batch_size 2 --per_gpu_eval_batch_size 2 --learning_rate 1e-5 --gradient_accumulation_steps 24 --weight_decay 0.1 --adam_epsilon 1e-12 --num_train_epochs 3 --warmup_steps 1000 --logging_steps 1000 --save_steps 1000 --fp16

Kamal Raj notifications@github.com 於 2019年11月11日 週一 上午1:27寫道: …

@pohanchi Thanks so much, most helpful. I'll give it a try this week.

ahotrod commented 4 years ago

@pohanchi Should have time to make some test runs over the next few days.

I would like to run pytorch_albert_xxlarge (Version 1) but there's no entry in pytorch_model_state_dict. Is that something I can generate? Will also probably need albert_config_xxlarge.json in config.

pohanchi commented 4 years ago

Oh I think xxlarge Model is not efficiency for me(low resource),so i didn’t want to generate! But if u need, I can generate it for a few day later

pohanchi commented 4 years ago

Another method to do this, just go to version 1 tf-hub web page for albert xxlarge and download the tf-hub module and using my convert_tf_xxx.py to generate model_state_dict . Config you can just copy config on tf-hub module webpage and generate a Json file

pohanchi commented 4 years ago

@ahotord I will do it this week but not in hurry. So maybe it will in Friday or Saturday. By the way, I add Lamb optimizer and it perform normal. But it can use !

ahotrod commented 4 years ago

Another method to do this, just go to version 1 tf-hub web page for albert xxlarge and download the tf-hub module and using my convert_tf_xxx.py to generate model_state_dict . Config you can just copy config on tf-hub module webpage and generate a Json file

@pohanchi Perfect, I'll give this a try.

Per the original paper's github, albert_xxlarge V1 outperforms V2 new release. I'm currently running albert_xlarge with bs=2 per gpu, so I figure I can run xxlarge with bs=1 and 2x grad_accumulation (24). Looking for every bit of increased "accuracy" for fine-tuned SQuAD 2.0.

Thanks again.

ahotrod commented 4 years ago

albert xlarge Version 1 SQuAD 2.0: { "exact": 81.58847805946264, "f1": 84.95025543030565, "total": 11873, "HasAns_exact": 78.12078272604589, "HasAns_f1": 84.85397819231112, "HasAns_total": 5928, "NoAns_exact": 85.04625735912532, "NoAns_f1": 85.04625735912532, "NoAns_total": 5945, "best_exact": 81.93379937673714, "best_exact_thresh": -1.7692835330963135, "best_f1": 85.16188049847581, "best_f1_thresh": -0.575573205947876 } albert xlarge Version 2 SQuAD 2.0, significant improvement: { "exact": 84.51107554956624, "f1": 87.57064747649119, "total": 11873, "HasAns_exact": 80.70175438596492, "HasAns_f1": 86.82967231585373, "HasAns_total": 5928, "NoAns_exact": 88.30950378469302, "NoAns_f1": 88.30950378469302, "NoAns_total": 5945, "best_exact": 84.61214520340268, "best_exact_thresh": -1.5449519157409668, "best_f1": 87.62058069832646, "best_f1_thresh": -0.16038274765014648 } Unable to fit albert xxlarge Version 1 on single GPU with bs=1. Will eventually use cloud V100s for fine-tuning that model on SQuAD 2.0.

pohanchi commented 4 years ago

So they all can train, right. They are no problems.

On Sun, Nov 17, 2019 at 08:13 ahotrod notifications@github.com wrote:

albert xlarge Version 1 SQuAD 2.0: { "exact": 81.58847805946264, "f1": 84.95025543030565, "total": 11873, "HasAns_exact": 78.12078272604589, "HasAns_f1": 84.85397819231112, "HasAns_total": 5928, "NoAns_exact": 85.04625735912532, "NoAns_f1": 85.04625735912532, "NoAns_total": 5945, "best_exact": 81.93379937673714, "best_exact_thresh": -1.7692835330963135, "best_f1": 85.16188049847581, "best_f1_thresh": -0.575573205947876 } albert xlarge Version 2 SQuAD 2.0, significant improvement: { "exact": 84.51107554956624, "f1": 87.57064747649119, "total": 11873, "HasAns_exact": 80.70175438596492, "HasAns_f1": 86.82967231585373, "HasAns_total": 5928, "NoAns_exact": 88.30950378469302, "NoAns_f1": 88.30950378469302, "NoAns_total": 5945, "best_exact": 84.61214520340268, "best_exact_thresh": -1.5449519157409668, "best_f1": 87.62058069832646, "best_f1_thresh": -0.16038274765014648 } Unable to fit albert xxlarge Version 1 on single GPU with bs=1. Will eventually use cloud V100s for fine-tuning that model on SQuAD 2.0.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/pohanchi/huggingface_albert/issues/1?email_source=notifications&email_token=AIEAE4BYVQV5L5PU6A5T2MDQUCEDFA5CNFSM4JIHYN2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEH5V2A#issuecomment-554687208, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4AQ34KI2QREVDLMBQLQUCEDFANCNFSM4JIHYN2A .

ahotrod commented 4 years ago

So they all can train, right. They are no problems.

@pohanchi albert xlarge Versions 1 & 2 train & evaluate on SQuAD 1.1 & 2.0, thanks! albert xlarge Version 2 SQuAD 2.0 will be my model of choice for now.