stanfordnlp / stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
https://stanfordnlp.github.io/stanza/
Other
7.25k stars 888 forks source link

[QUESTION] #679

Closed sarves closed 3 years ago

sarves commented 3 years ago

1. Why the result we get after the training is different from the result we get when executing scripts/run_ete.sh?

2. Where can I find the detail architecture/more technical details of Stanza? I looked at the paper which is given for us to cite and the official website. Do we have any other references?

Thank you

AngledLuffa commented 3 years ago

Please fill in more details for question #1.

For question 2, again, it's not clear what you want. You could always look at the implementation of the forward pass of the various models, which for the most part is in stanza/models/___/model.py

On Tue, Apr 27, 2021 at 4:49 AM Sarves @.***> wrote:

1.

Why the result we get after the training is different from the result we get when executing scripts/run_ete.sh?

1.

Where can I find the detail architecture/more technical details of Stanza? I looked at the paper which is given for us to cite and the official website. Do we have any other references?

Thank you

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWIRN5OYFRXQZSOOQ7DTK2QFJANCNFSM43U2VKRA .

qipeng commented 3 years ago

@sarves the Stanza paper also cites our previous work Universal Dependency Parsing from Scratch which contains more details about model architecture (which in turn cites some earlier work, etc).

sarves commented 3 years ago

Please fill in more details for question #1. For question 2, again, it's not clear what you want. You could always look at the implementation of the forward pass of the various models, which for the most part is in stanza/models/___/model.py On Tue, Apr 27, 2021 at 4:49 AM Sarves @.***> wrote: 1. Why the result we get after the training is different from the result we get when executing scripts/run_ete.sh? 1. Where can I find the detail architecture/more technical details of Stanza? I looked at the paper which is given for us to cite and the official website. Do we have any other references? Thank you — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#679>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWIRN5OYFRXQZSOOQ7DTK2QFJANCNFSM43U2VKRA .

OK Here is the response. I trained a parser for Tamil using my own dataset. At the end of the dependency training, I got the following results: 2021-06-17 06:18:15 DEBUG: Found existing .pt file in saved_models/depparse/ta_thamizhi.pretrain.pt 2021-06-17 06:18:15 INFO: Loading model from: saved_models/depparse/ta_thamizhi_parser.pt 2021-06-17 06:18:15 DEBUG: Loaded pretrain from saved_models/depparse/ta_thamizhi.pretrain.pt 2021-06-17 06:18:15 INFO: Loading data with batch size 5000... 2021-06-17 06:18:16 DEBUG: 1 batches created. 2021-06-17 06:18:16 INFO: Start evaluation... 2021-06-17 06:18:16 INFO: LAS MLAS BLEX 2021-06-17 06:18:16 INFO: 70.12 65.20 68.72 2021-06-17 06:18:16 INFO: Parser score: 2021-06-17 06:18:16 INFO: ta_thamizhi 70.12 2021-06-17 06:18:16 INFO: Finished running dev set on UD_Tamil-Thamizhi 79.56 70.12 68.72 65.20 68.72

This shows that this results are based on the evaluation which is done on Dev set.

However, when I do an evaluation on Dev using run_ete.py --score_dev, I get the following results - this is slightly different from what I got at the end of the training. python stanza/utils/training/run_ete.py UD_Tamil-Thamizhi --score_dev 2021-06-17 06:25:27 INFO: Training program called with: stanza/utils/training/run_ete.py UD_Tamil-Thamizhi --score_dev 2021-06-17 06:25:27 DEBUG: UD_Tamil-Thamizhi: ta_thamizhi 2021-06-17 06:25:27 INFO: ----- TOKENIZER ---------- 2021-06-17 06:25:27 INFO: Running tokenizer step with args: ['--mode', 'predict', '--txt_file', 'data/tokenize/ta_thamizhi.dev.txt', '--lang', 'ta', '--conll_file', '/tmp/tmpzpkf9wb1/ta_thamizhi.dev.tokenizer.conllu', '--shorthand', 'ta_thamizhi'] 2021-06-17 06:25:27 INFO: Running tokenizer in predict mode 2021-06-17 06:25:28 DEBUG: 1 sentences loaded. 2021-06-17 06:25:29 INFO: OOV rate: 0.030% ( 5/ 16654) 2021-06-17 06:25:29 INFO: ----- MWT ---------- 2021-06-17 06:25:29 INFO: Running mwt step with args: ['--eval_file', '/tmp/tmpzpkf9wb1/ta_thamizhi.dev.tokenizer.conllu', '--output_file', '/tmp/tmpzpkf9wb1/ta_thamizhi.dev.mwt.conllu', '--lang', 'ta', '--shorthand', 'ta_thamizhi', '--mode', 'predict'] 2021-06-17 06:25:29 INFO: Running MWT expander in predict mode 2021-06-17 06:25:29 DEBUG: Building an attentional Seq2Seq model... 2021-06-17 06:25:29 DEBUG: Using a Bi-LSTM encoder 2021-06-17 06:25:29 DEBUG: Using soft attention for LSTM. 2021-06-17 06:25:29 DEBUG: Finetune all embeddings. 2021-06-17 06:25:29 DEBUG: max_dec_len: 34 2021-06-17 06:25:29 DEBUG: Loading data with batch size 50... 2021-06-17 06:25:29 DEBUG: 1 batches created. 2021-06-17 06:25:29 INFO: Running the seq2seq model... 2021-06-17 06:25:29 INFO: ----- POS ---------- 2021-06-17 06:25:29 INFO: Running pos step with args: ['--wordvec_dir', 'extern_data/wordvec', '--eval_file', '/tmp/tmpzpkf9wb1/ta_thamizhi.dev.mwt.conllu', '--output_file', '/tmp/tmpzpkf9wb1/ta_thamizhi.dev.pos.conllu', '--lang', 'ta_thamizhi', '--shorthand', 'ta_thamizhi', '--mode', 'predict'] 2021-06-17 06:25:29 INFO: Running tagger in predict mode 2021-06-17 06:25:29 DEBUG: Found existing .pt file in saved_models/pos/ta_thamizhi.pretrain.pt 2021-06-17 06:25:29 INFO: Loading model from: saved_models/pos/ta_thamizhi_tagger.pt 2021-06-17 06:25:29 DEBUG: Loaded pretrain from saved_models/pos/ta_thamizhi.pretrain.pt 2021-06-17 06:25:29 INFO: Loading data with batch size 5000... 2021-06-17 06:25:29 DEBUG: 1 batches created. 2021-06-17 06:25:29 INFO: Start evaluation... 2021-06-17 06:25:30 INFO: ----- LEMMA ---------- 2021-06-17 06:25:30 INFO: Running lemmatizer step with args: ['--eval_file', '/tmp/tmpzpkf9wb1/ta_thamizhi.dev.pos.conllu', '--output_file', '/tmp/tmpzpkf9wb1/ta_thamizhi.dev.lemma.conllu', '--lang', 'ta_thamizhi', '--mode', 'predict'] 2021-06-17 06:25:30 INFO: Running lemmatizer in predict mode 2021-06-17 06:25:30 DEBUG: Building an attentional Seq2Seq model... 2021-06-17 06:25:30 DEBUG: Using a Bi-LSTM encoder 2021-06-17 06:25:30 DEBUG: Using soft attention for LSTM. 2021-06-17 06:25:30 DEBUG: Using POS in encoder 2021-06-17 06:25:30 DEBUG: Finetune all embeddings. 2021-06-17 06:25:30 DEBUG: Running seq2seq lemmatizer with edit classifier... 2021-06-17 06:25:30 INFO: Loading data with batch size 50... 2021-06-17 06:25:30 DEBUG: 47 batches created. 2021-06-17 06:25:30 INFO: Running the seq2seq model... 2021-06-17 06:25:30 INFO: [Ensembling dict with seq2seq lemmatizer...] 2021-06-17 06:25:31 INFO: ----- DEPPARSE ---------- 2021-06-17 06:25:31 INFO: Running depparse step with args: ['--wordvec_dir', 'extern_data/wordvec', '--eval_file', '/tmp/tmpzpkf9wb1/ta_thamizhi.dev.lemma.conllu', '--output_file', '/tmp/tmpzpkf9wb1/ta_thamizhi.dev.depparse.conllu', '--lang', 'ta_thamizhi', '--shorthand', 'ta_thamizhi', '--mode', 'predict'] 2021-06-17 06:25:31 INFO: Running parser in predict mode 2021-06-17 06:25:31 DEBUG: Found existing .pt file in saved_models/depparse/ta_thamizhi.pretrain.pt 2021-06-17 06:25:31 INFO: Loading model from: saved_models/depparse/ta_thamizhi_parser.pt 2021-06-17 06:25:31 DEBUG: Loaded pretrain from saved_models/depparse/ta_thamizhi.pretrain.pt 2021-06-17 06:25:31 INFO: Loading data with batch size 5000... 2021-06-17 06:25:31 DEBUG: 1 batches created. 2021-06-17 06:25:31 INFO: Start evaluation... 2021-06-17 06:25:31 INFO: ----- EVALUATION ---------- 2021-06-17 06:25:31 INFO: End to end results for ta_thamizhi models on ta_thamizhi dev data: Metric | Precision | Recall | F1 Score | AligndAcc -----------+-----------+-----------+-----------+----------- Tokens | 99.96 | 99.91 | 99.93 | Sentences | 100.00 | 100.00 | 100.00 | Words | 97.79 | 95.47 | 96.62 | UPOS | 86.43 | 84.38 | 85.40 | 88.39 XPOS | 97.79 | 95.47 | 96.62 | 100.00 UFeats | 60.47 | 59.04 | 59.74 | 61.84 AllTags | 56.39 | 55.06 | 55.72 | 57.67 Lemmas | 87.52 | 85.44 | 86.47 | 89.49 UAS | 76.42 | 74.61 | 75.50 | 78.15 LAS | 67.19 | 65.59 | 66.38 | 68.71 CLAS | 64.32 | 65.87 | 65.09 | 67.90 MLAS | 32.45 | 33.23 | 32.84 | 34.26 BLEX | 56.71 | 58.07 | 57.38 | 59.86

Why these two experiments differ, although those were done on the same dev set?

Sarves

AngledLuffa commented 3 years ago

Are the depparse scores you report in the first step the scores from run_depparse? run_depparse starts from the gold information (with predicted tags if you use that option) and scores just the dependencies. run_ete lets errors propagate from one step to the next step.

Since MLAS takes into account tags and features, probably those scores are not great - indeed, the tagger scores you list a few lines above MLAS are pretty bad.

sarves commented 3 years ago

Are the depparse scores you report in the first step the scores from run_depparse? run_depparse starts from the gold information (with predicted tags if you use that option) and scores just the dependencies.

Yes, its from run_depparse.
I havent used --gold. I think I should use it (because, I do not want the training set to be pos-tagged again)

run_ete lets errors propagate from one step to the next step. Since MLAS takes into account tags and features, probably those scores are not great - indeed, the tagger scores you list a few lines above MLAS are pretty bad.

Yes, its poor. However, since I have a morphological analyser, separately, I can take only the LAS. Hope it make sense.

One more question, do we really need the lemma to train the dependency parser? does that information used for any processing?

Now I am training the parser again with training set with duplicated sentences (I had 1000 sentence, by duplicating the set, now I have 2000 sentences). Not sure it will make any sense to train with with duplicated set.

Thanks

AngledLuffa commented 3 years ago

I havent used --gold. I think I should use it (because, I do not want the training set to be pos-tagged again)

The standard argument is that when you use the depparse on raw text, you will first need to tag the raw text, so you want the parser to be trained on the tags it will see, not the tags you wish it would see.

Now I am training the parser again with training set with duplicated sentences (I had 1000 sentence, by duplicating the set, now I have 2000 sentences). Not sure it will make any sense to train with with duplicated set.

It already cycles through the training data multiple times, so unless pass

2 has some subtle changes to the data, this will not be helpful.