sacmehta / delight

DeLighT: Very Deep and Light-Weight Transformers
MIT License
467 stars 53 forks source link

train.py: error: unrecognized arguments: --t-mult 1 #13

Open fkjslee opened 2 years ago

fkjslee commented 2 years ago

Dear author: when I run script python nmt_wmt16_en2ro.py --d-m 384 the following error will be given train.py: error: unrecognized arguments: --t-mult 1

What's more, when I read the code detailly. I can't find the arg '--t-mult'. Below is my error log:

$ python nmt_wmt16_en2ro.py --d-m 384 2022-01-10 15:23:03 - LOGS - Training command: python train.py data-bin/wmt14_en_ro --arch delight_transformer_wmt16_en_ro --no-progress-bar --optimizer adam --adam- betas '(0.9, 0.98)' --clip-norm 0.0 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --min-lr 1e-09 --update-freq 1 --keep-last-epochs 10 --ddp-backend=no_c10d --max-tokens 4096 --max-update 100000 --warmup-updates 10000 --lr-scheduler linear --warmup-init-lr 1e-7 --lr 0.0009 --min-lr 1e-9 --t-mult 1 --save-dir ./results_wmt16_en2ro/delight_out_384 --distributed-world-size 8 --distributed-port 50786 --delight-emb-map-dim 128 --delight-emb-out-dim 384 --delight-enc-min-depth 4 --delight-enc-max-depth 8 --delight-enc-width-mult 2 --delight-dec-min-depth 4 --delight-dec-max-depth 8 --delight-dec-width-mult 2 | tee -a ./results_wmt16_en2ro/delight_out_384/logs.txt usage: train.py [-h] [--no-progress-bar] [--log-interval N] [--log-format {json,none,simple,tqdm}] [--tensorboard-logdir DIR] [--seed N] [--cpu] [--fp16] [--memory-efficient-fp16] [--fp16-no-flatten-grads] [--fp16-init-scale FP16_INIT_SCALE] [--fp16-scale-window FP16_SCALE_WINDOW] [--fp16-scale-tolerance FP16_SCALE_TOLERANCE] [--min-loss-scale D] [--threshold-loss-scale THRESHOLD_LOSS_SCALE] [--user-dir USER_DIR] [--empty-cache-freq EMPTY_CACHE_FREQ] [--all-gather-list-size ALL_GATHER_LIST_SIZE] [--criterion {label_smoothed_cross_entropy,sentence_ranking,legacy_masked_lm_loss,composite_loss,label_smoothed_cross_entropy_with_alignment,adaptive_loss,adaptive_cross_entropy,nat_loss,sentence_prediction,masked_lm,cross_entropy,binary_cross_entropy}] [--tokenizer {moses,nltk,space}] [--bpe {fastbpe,subword_nmt,bert,sentencepiece,gpt2}] [--optimizer {adadelta,adamax,adagrad,adafactor,sgd,lamb,nag,adam}] [--lr-scheduler {cosine,inverse_sqrt,linear,triangular,fixed,reduce_lr_on_plateau,polynomial_decay,tri_stage}] [--task TASK] [--num-workers N] [--skip-invalid-size-inputs-valid-test] [--max-tokens N] [--max-sentences N] [--required-batch-size-multiple N] [--dataset-impl FORMAT] [--train-subset SPLIT] [--valid-subset SPLIT] [--validate-interval N] [--fixed-validation-seed N] [--disable-validation] [--max-tokens-valid N] [--max-sentences-valid N] [--curriculum N] [--distributed-world-size N] [--distributed-rank DISTRIBUTED_RANK] [--distributed-backend DISTRIBUTED_BACKEND] [--distributed-init-method DISTRIBUTED_INIT_METHOD] [--distributed-port DISTRIBUTED_PORT] [--device-id DEVICE_ID] [--distributed-no-spawn] [--ddp-backend {c10d,no_c10d}] [--bucket-cap-mb MB] [--fix-batches-to-gpus] [--find-unused-parameters] [--fast-stat-sync] [--broadcast-buffers] [--arch ARCH] [--max-epoch N] [--max-update N] [--clip-norm NORM] [--sentence-avg] [--update-freq N1,N2,...,N_K] [--lr LR_1,LR_2,...,LR_N] [--min-lr LR] [--use-bmuf] [--save-dir DIR] [--restore-file RESTORE_FILE] [--reset-dataloader] [--reset-lr-scheduler] [--reset-meters] [--reset-optimizer] [--optimizer-overrides DICT] [--save-interval N] [--save-interval-updates N] [--keep-interval-updates N] [--keep-last-epochs N] [--keep-best-checkpoints N] [--no-save] [--no-epoch-checkpoints] [--no-last-checkpoints] [--no-save-optimizer-state] [--best-checkpoint-metric BEST_CHECKPOINT_METRIC] [--maximize-best-checkpoint-metric] [--patience N] [--adaptive-input] [--adaptive-softmax-cutoff EXPR] [--adaptive-softmax-dropout D] [--adaptive-softmax-factor N] [--tie-adaptive-weights] [--tie-adaptive-proj] [--delight-emb-map-dim DELIGHT_EMB_MAP_DIM] [--delight-emb-out-dim DELIGHT_EMB_OUT_DIM] [--delight-emb-width-mult DELIGHT_EMB_WIDTH_MULT] [--delight-emb-max-groups DELIGHT_EMB_MAX_GROUPS] [--delight-emb-dropout DELIGHT_EMB_DROPOUT] [--delight-emb-depth DELIGHT_EMB_DEPTH] [--delight-enc-scaling {block,uniform}] [--delight-enc-layers DELIGHT_ENC_LAYERS] [--delight-enc-min-depth DELIGHT_ENC_MIN_DEPTH] [--delight-enc-max-depth DELIGHT_ENC_MAX_DEPTH] [--delight-enc-width-mult DELIGHT_ENC_WIDTH_MULT] [--delight-enc-ffn-red DELIGHT_ENC_FFN_RED] [--delight-enc-max-groups DELIGHT_ENC_MAX_GROUPS] [--delight-dec-scaling {block,uniform}] [--delight-dec-layers DELIGHT_DEC_LAYERS] [--delight-dec-min-depth DELIGHT_DEC_MIN_DEPTH] [--delight-dec-max-depth DELIGHT_DEC_MAX_DEPTH] [--delight-dec-width-mult DELIGHT_DEC_WIDTH_MULT] [--delight-dec-ffn-red DELIGHT_DEC_FFN_RED] [--delight-dec-max-groups DELIGHT_DEC_MAX_GROUPS] [--no-glt-shuffle] [--define-iclr] [--norm-type NORM_TYPE] [--act-type ACT_TYPE] [--delight-dropout DELIGHT_DROPOUT] [--ffn-dropout FFN_DROPOUT] [--print-stats] [--src-len-ps SRC_LEN_PS] [--tgt-len-ps TGT_LEN_PS] [--dropout D] [--attention-dropout D] [--pe-dropout D] [--activation-dropout D] [--encoder-normalize-before] [--decoder-normalize-before] [--share-decoder-input-output-embed] [--share-all-embeddings] [--decoder-learned-pos] [--encoder-learned-pos] [--no-token-positional-embeddings] [--no-scale-embedding] [--label-smoothing D] [--adam-betas B] [--adam-eps D] [--weight-decay WD] [--use-old-adam] [--warmup-updates N] [--warmup-init-lr LR] [-s SRC] [-t TARGET] [--load-alignments] [--left-pad-source BOOL] [--left-pad-target BOOL] [--max-source-positions N] [--max-target-positions N] [--upsample-primary UPSAMPLE_PRIMARY] [--truncate-source] [--eval-bleu] [--eval-bleu-detok EVAL_BLEU_DETOK] [--eval-bleu-detok-args JSON] [--eval-tokenized-bleu] [--eval-bleu-remove-bpe [EVAL_BLEU_REMOVE_BPE]] [--eval-bleu-args JSON] [--eval-bleu-print-samples] data train.py: error: unrecognized arguments: --t-mult 1 thank you

It looks like the format distorted by github. I paste it below

https://paste.ofcode.org/fkBdqtjQdEFr6QeymGY49F

plz take a look if you need.