ufal / evalatin2024-latinpipe

LatinPipe – the winning entry to parsing task of EvaLatin 2024
Mozilla Public License 2.0
4 stars 0 forks source link

error: argument --dev: expected at least one argument #3

Closed locusclassicus closed 2 months ago

locusclassicus commented 2 months ago

Hi, thanks for sharing your code and data! After running (in Collab) the command:

!python /content/evalatin2024-latinpipe/latinpipe_evalatin24.py $(for split in dev test train; do echo --$split; for tb in $la_ud213_all; do [ $tb-$split = la_proiel-train ] && tb=la_proielh; echo data/$tb/$tb-ud-$split.conllu; done; done) $(for tb in $la_other; do echo data/$tb/$tb-train.conllu; done) --transformers $transformer --epochs=30 --exp=evalatin24_model --subword_combination=last --epochs_frozen=10 --batch_size=64 --save_checkpoint

I get the following error:

usage: latinpipe_evalatin24.py [-h] [--batch_size BATCH_SIZE] [--deprel {full,universal}]
                               [--dev DEV [DEV ...]] [--dropout DROPOUT] [--embed_tags EMBED_TAGS]
                               [--epochs EPOCHS] [--epochs_frozen EPOCHS_FROZEN] [--exp EXP]
                               [--label_smoothing LABEL_SMOOTHING] [--learning_rate LEARNING_RATE]
                               [--learning_rate_decay {none,cos}]
                               [--learning_rate_warmup LEARNING_RATE_WARMUP] [--load [LOAD ...]]
                               [--max_train_sentence_len MAX_TRAIN_SENTENCE_LEN]
                               [--optimizer {adam,adafactor}] [--parse PARSE]
                               [--parse_attention_dim PARSE_ATTENTION_DIM] [--rnn_dim RNN_DIM]
                               [--rnn_layers RNN_LAYERS]
                               [--rnn_type {LSTM,GRU,LSTMTorch,GRUTorch}] [--save_checkpoint]
                               [--seed SEED] [--steps_per_epoch STEPS_PER_EPOCH]
                               [--single_root SINGLE_ROOT]
                               [--subword_combination {first,last,sum,concat}] [--tags TAGS]
                               [--task_hidden_layer TASK_HIDDEN_LAYER] [--test TEST [TEST ...]]
                               [--train TRAIN [TRAIN ...]]
                               [--train_sampling_exponent TRAIN_SAMPLING_EXPONENT]
                               [--transformers TRANSFORMERS [TRANSFORMERS ...]] [--treebank_ids]
                               [--threads THREADS] [--verbose VERBOSE] [--wandb]
                               [--word_masking WORD_MASKING]
latinpipe_evalatin24.py: error: argument --dev: expected at least one argument

Any ideas how to fix this --dev issue? Many thanks in advance.

locusclassicus commented 2 months ago

Well, I finally fixed it by indicating the full path to files:

python latinpipe_evalatin24.py --dev data/la_ittb/la_ittb-ud-dev.conllu --test data/la_ittb/la_ittb-ud-test.conllu --train data/la_ittb/la_ittb-ud-train.conllu --train data/la_llct/la_llct-ud-train.conllu --train data/la_perseus/la_perseus-ud-train.conllu --train data/la_proiel/la_proiel-ud-train.conllu --train data/la_udante/la_udante-ud-train.conllu --train data/la_archimedes/la_archimedes-train.conllu --train data/la_sabellicus/la_sabellicus-train.conllu --transformers bowphs/PhilBerta --epochs 30 --exp evalatin24_model --subword_combination last --epochs_frozen 10 --batch_size 64 --save_checkpoint
foxik commented 2 months ago

Hi,

in the README.md, there are three lines before the training command itself:

la_ud213_all="la_ittb la_llct la_perseus la_proiel la_udante"
la_other="la_archimedes la_sabellicus"
transformer="bowphs/PhilBerta"  # or bowphs/LaBerta

latinpipe_evalatin24.py $(for split in dev test train; do echo --$split; for tb in $la_ud213_all; do [ $tb-$split = la_proiel-train ] && tb=la_proielh; echo data/$tb/$tb-ud-$split.conllu; done; done) $(for tb in $la_other; do echo data/$tb/$tb-train.conllu; done) --transformers $transformer --epochs=30 --exp=evalatin24_model --subword_combination=last --epochs_frozen=10 --batch_size=64 --save_checkpoint

I think you did not run them (i.e., the shell variables were not set), which resulted in no treebanks passed as --dev, --test, and --train.

Note that the full path is used in the command, i.e., echo data/$tb/$tb-ud-$split.conllu, but I assume the iteration over $la_ud213_all was void, as indicated by the error message that --dev got no argument.

Your fixed command explicitly iterates the treebanks (without the for cycle), which is of course possible; but I wanted the treebanks to use to be easily configurable, hence the shell variables.

locusclassicus commented 2 months ago

Thank you for your reply! I ran the three lines before the command, but apparently there was a problem with tar archive when fetching files; when I fixed it, the for cycle worked well. In case someone else faces the same problem, I managed to fetch all the folders in Collab (and then saved them locally).