microsoft / ProphetNet

A research project for natural language generation, containing the official implementations by MSRA NLC team.
MIT License
651 stars 104 forks source link

1. Post a typo. 2. Wander the time cost. 3. Error in inference #48

Open PolarisRisingWar opened 2 years ago

PolarisRisingWar commented 2 years ago
  1. In https://github.com/microsoft/ProphetNet/tree/master/ProphetNet_Zh ,the example of preprocessing data, the first line: import transformers import BertTokenizer I think it should be from transformers import BertTokenizer.

  2. I'm running finetune code in this script:
    
    DATA_DIR=mypath/bl1/prophetnet/processed2
    USER_DIR=mypath/bl1/prophetnet/prophetnet
    ARCH=ngram_transformer_prophet_large
    CRITERION=ngram_language_loss
    SAVE_DIR=mypath/bl1/prophetnet/saves/save2
    TENSORBOARD_LOGDIR=mypath/bl1/prophetnet/logs/log2
    PRETRAINED_MODEL=mypath/data/bert_model/prophetnet_zh.pt

fairseq-train \ --fp16 \ --user-dir $USER_DIR --task translation_prophetnet --arch $ARCH \ --optimizer adam --adam-betas '(0.9, 0.999)' --clip-norm 0.1 \ --lr 0.00001 --min-lr 1e-09 \ --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 1000 \ --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \ --criterion $CRITERION --label-smoothing 0.1 \ --update-freq 1 --max-tokens 1400 --max-sentences 7 \ --num-workers 4 \ --load-from-pretrained-model $PRETRAINED_MODEL \ --ddp-backend=no_c10d --max-epoch 10 \ --max-source-positions 1024 --max-target-positions 512 \ --skip-invalid-size-inputs-valid-test \ --save-dir $SAVE_DIR \ --keep-last-epochs 10 \ --tensorboard-logdir $TENSORBOARD_LOGDIR \ $DATA_DIR


And the log file and output in the terminal are both just stagnant, it has nothing output. I wonder if it's just too slow to quickly show some output or I've written wrong codes.  
So I want to ask how much time should I cost? (I have about 7000 samples in train dataset, 2000 in validation and 2000 in test.<br><br>
3. When directly using the downloaded pretrained model to inference by this script:  

BEAM=5 LENPEN=1.5 CHECK_POINT=mypath/data/bert_model/prophetnet_zh.pt TEMP_FILE=mypath/bl1/prophetnet/infers/infer2/fairseq_outputs.txt OUTPUT_FILE=mypath/bl1/prophetnet/infers/infer2/sorted_outputs.txt

fairseq-generate mypath/bl1/prophetnet/processed2 --path $CHECK_POINT --user-dir mypath/bl1/prophetnet/prophetnet --task translation_prophetnet --batch-size 80 --gen-subset test --beam $BEAM --num-workers 4 --no-repeat-ngram-size 3 --lenpen $LENPEN 2>&1 > $TEMP_FILE grep ^H $TEMP_FILE | cut -c 3- | sort -n | cut -f3- | sed "s/ ##//g" > $OUTPUT_FILE

I got this error message: 

Traceback (most recent call last): File "mypath/anaconda3/envs/envfastsum/bin/fairseq-generate", line 33, in sys.exit(load_entry_point('fairseq==0.9.0', 'console_scripts', 'fairseq-generate')()) File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq_cli/generate.py", line 199, in cli_main main(args) File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq_cli/generate.py", line 47, in main task=task, File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 179, in load_model_ensemble ensemble, args, _task = load_model_ensemble_and_task(filenames, arg_overrides, task) File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 190, in load_model_ensemble_and_task state = load_checkpoint_to_cpu(filename, arg_overrides) File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 166, in load_checkpoint_to_cpu state = _upgrade_state_dict(state) File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 300, in _upgrade_state_dict {"criterion_name": "CrossEntropyCriterion", "best_loss": state["best_loss"]} KeyError: 'best_loss'


I've found there are also other issues referring to this problem. But I haven't found any direct ways to solve it. So I wonder how to solve this problem?
PolarisRisingWar commented 2 years ago

Honestly... I've run 5 hours for just 90 samples and the log has never changed... I thought it must because I've run the code wrongly, so I terminated the code.

steve3p0 commented 2 years ago

Anyone find a solution to this?