DecomposedMetaNER evaluate problem

fbotp commented 1 year ago

I have put few-nerd eposide-data into ./episode-data/inter, and run bash script/run.sh. But some metrics of model, such as precision, recall, and f1 all return 0. Does anything I do wrong?

iofu728 commented 1 year ago

Can you provide more information, such as logs, and test environment? The data location looks correct, but I'm not sure if the data itself is correct.

fbotp commented 1 year ago

sure, here is log-training.txt in models-5-1-inter/bert-base-uncased-innerSteps_2-innerSize_32-lrInner_0.0001-lrMeta_0.0001-maxSteps_5001-seed_171-name_10-k_100_type_2_32_3_10_10:

2023-04-12 14:10:47 INFO: - Using Device cuda
2023-04-12 14:10:47 INFO: - Load 67 entity types from data/entity_types.json.
2023-04-12 14:10:48 INFO: - Built the types embedding.
2023-04-12 14:10:48 INFO: - ********** Scheme: Meta Learning **********
2023-04-12 14:10:51 INFO: - Construct the transition matrix via [hard] scheme...
2023-04-12 14:10:52 INFO: - Construct the transition matrix via [none] scheme...
2023-04-12 14:11:08 INFO: - Construct the transition matrix via [hard] scheme...
2023-04-12 14:11:08 INFO: - Reading tasks from episode-data/inter/test_5_1.jsonl...
2023-04-12 14:11:08 INFO: -   update_transition_matrix = False
2023-04-12 14:11:08 INFO: -   concat_types = None
2023-04-12 14:11:08 INFO: - Reading tasks 0 of 5000
2023-04-12 14:11:26 INFO: - Reading tasks 1000 of 5000
2023-04-12 14:11:41 INFO: - Reading tasks 2000 of 5000
2023-04-12 14:11:57 INFO: - Reading tasks 3000 of 5000
2023-04-12 14:12:13 INFO: - Reading tasks 4000 of 5000
2023-04-13 08:33:39 INFO: - Using Device cuda
2023-04-13 08:33:39 INFO: - Load 67 entity types from data/entity_types.json.
2023-04-13 08:33:40 INFO: - Built the types embedding.
2023-04-13 08:33:40 INFO: - ********** Scheme: Meta Learning **********
2023-04-13 08:33:41 INFO: - Construct the transition matrix via [hard] scheme...
2023-04-13 08:33:42 INFO: - Construct the transition matrix via [none] scheme...
2023-04-13 08:33:54 INFO: - Construct the transition matrix via [hard] scheme...
2023-04-13 08:33:54 INFO: - Reading tasks from episode-data/inter/test_5_1.jsonl...
2023-04-13 08:33:54 INFO: -   update_transition_matrix = False
2023-04-13 08:33:54 INFO: -   concat_types = None
2023-04-13 08:33:54 INFO: - Reading tasks 0 of 5000
2023-04-13 08:34:10 INFO: - Reading tasks 1000 of 5000
2023-04-13 08:34:23 INFO: - Reading tasks 2000 of 5000
2023-04-13 08:34:37 INFO: - Reading tasks 3000 of 5000
2023-04-13 08:34:50 INFO: - Reading tasks 4000 of 5000
2023-04-13 08:35:04 INFO: - episode-data/inter/test_5_1.jsonl Max Entities Lengths: 4, Max batch Types Number: 5, Max sentence Length: 133
2023-04-13 08:35:07 INFO: - ********** Loading pre-trained model **********
2023-04-13 08:35:09 INFO: - Model Setting: {'use_classify': False, 'distance_mode': 'cos', 'similar_k': 10.0, 'shared_bert': True, 'train_mode': 'type'}
2023-04-13 08:35:09 INFO: - The frozen parameters are:
2023-04-13 08:35:09 INFO: -   bert.embeddings.word_embeddings.weight
2023-04-13 08:35:09 INFO: -   bert.embeddings.position_embeddings.weight
2023-04-13 08:35:09 INFO: -   bert.embeddings.token_type_embeddings.weight
2023-04-13 08:35:09 INFO: -   bert.embeddings.LayerNorm.weight
2023-04-13 08:35:09 INFO: -   bert.embeddings.LayerNorm.bias
2023-04-13 08:35:15 INFO: - Step: 0/5001, span loss = 0.000000, type loss = 1.143968, time = 5.66s.
2023-04-13 08:36:57 INFO: - Step: 20/5001, span loss = 0.000000, type loss = 1.164489, time = 108.04s.
2023-04-13 08:38:37 INFO: - Step: 40/5001, span loss = 0.000000, type loss = 1.188575, time = 207.39s.
2023-04-13 08:40:17 INFO: - Step: 60/5001, span loss = 0.000000, type loss = 1.161188, time = 307.16s.
2023-04-13 08:41:56 INFO: - Step: 80/5001, span loss = 0.000000, type loss = 1.266302, time = 406.74s.
2023-04-13 08:43:39 INFO: - Step: 100/5001, span loss = 0.000000, type loss = 1.239490, time = 509.21s.
2023-04-13 08:43:39 INFO: - ********** Scheme: evaluate - [valid] **********
2023-04-13 08:43:39 INFO: - Begin first Stage.
2023-04-13 08:43:39 INFO: -   To sentence 0/1000. Time: 0.18263554573059082sec
2023-04-13 08:44:16 INFO: -   To sentence 200/1000. Time: 36.85404706001282sec
2023-04-13 08:44:53 INFO: -   To sentence 400/1000. Time: 73.89288449287415sec
2023-04-13 08:45:31 INFO: -   To sentence 600/1000. Time: 112.48784637451172sec
2023-04-13 08:46:08 INFO: -   To sentence 800/1000. Time: 149.45032167434692sec
2023-04-13 08:46:46 INFO: - Begin second Stage.
2023-04-13 08:46:47 INFO: - ***** Eval results inter-valid *****
2023-04-13 08:46:47 INFO: -   f1 = 0.0
2023-04-13 08:46:47 INFO: -   f1_threshold = 0.0
2023-04-13 08:46:47 INFO: -   loss = 0.0
2023-04-13 08:46:47 INFO: -   precision = 0.0
2023-04-13 08:46:47 INFO: -   precision_threshold = 0.0
2023-04-13 08:46:47 INFO: -   recall = 0.0
2023-04-13 08:46:47 INFO: -   recall_threshold = 0.0
2023-04-13 08:46:47 INFO: -   span_f1 = 0.0
2023-04-13 08:46:47 INFO: -   span_p = 0.0
2023-04-13 08:46:47 INFO: -   span_r = 0.0
2023-04-13 08:46:47 INFO: -   type_f1 = 0.5703959773227126
2023-04-13 08:46:47 INFO: -   type_p = 0.5703959773727126
2023-04-13 08:46:47 INFO: -   type_r = 0.5703959773727126
2023-04-13 08:46:47 INFO: - 0.000,0.000,0.000,0.000,0.000,0.000,57.040,57.040,57.040,0.000,0.000,0.000
2023-04-13 08:46:47 INFO: - ===> Best Valid F1: 0.0
2023-04-13 08:46:47 INFO: -   Saving model...
2023-04-13 08:46:48 INFO: - Best Type Store 100
2023-04-13 08:46:48 INFO: - ********** Scheme: evaluate - [test] **********
2023-04-13 08:46:48 INFO: - Begin first Stage.
2023-04-13 08:46:48 INFO: -   To sentence 0/5000. Time: 0.19287586212158203sec
2023-04-13 08:47:26 INFO: -   To sentence 200/5000. Time: 38.21218752861023sec
2023-04-13 08:48:05 INFO: -   To sentence 400/5000. Time: 76.91798210144043sec
2023-04-13 08:48:42 INFO: -   To sentence 600/5000. Time: 114.71551704406738sec
2023-04-13 08:49:19 INFO: -   To sentence 800/5000. Time: 150.97929668426514sec
2023-04-13 08:49:55 INFO: -   To sentence 1000/5000. Time: 187.45405554771423sec
2023-04-13 08:50:31 INFO: -   To sentence 1200/5000. Time: 223.6195011138916sec
2023-04-13 08:51:07 INFO: -   To sentence 1400/5000. Time: 259.847599029541sec
2023-04-13 08:51:46 INFO: -   To sentence 1600/5000. Time: 298.5319046974182sec
2023-04-13 08:52:26 INFO: -   To sentence 1800/5000. Time: 338.24009013175964sec
2023-04-13 08:53:06 INFO: -   To sentence 2000/5000. Time: 378.27178621292114sec
2023-04-13 08:53:45 INFO: -   To sentence 2200/5000. Time: 417.2717366218567sec
2023-04-13 08:54:24 INFO: -   To sentence 2400/5000. Time: 456.7584867477417sec
2023-04-13 08:55:06 INFO: -   To sentence 2600/5000. Time: 498.7607443332672sec
2023-04-13 08:55:45 INFO: -   To sentence 2800/5000. Time: 537.6424901485443sec
2023-04-13 08:56:23 INFO: -   To sentence 3000/5000. Time: 575.6152880191803sec
2023-04-13 08:57:00 INFO: -   To sentence 3200/5000. Time: 612.8272559642792sec
2023-04-13 08:57:43 INFO: -   To sentence 3400/5000. Time: 655.0388989448547sec
2023-04-13 08:58:22 INFO: -   To sentence 3600/5000. Time: 694.2115421295166sec
2023-04-13 08:59:00 INFO: -   To sentence 3800/5000. Time: 732.7367377281189sec
2023-04-13 08:59:38 INFO: -   To sentence 4000/5000. Time: 770.1423497200012sec
2023-04-13 09:00:16 INFO: -   To sentence 4200/5000. Time: 808.3991119861603sec
2023-04-13 09:00:53 INFO: -   To sentence 4400/5000. Time: 845.640289068222sec
2023-04-13 09:01:31 INFO: -   To sentence 4600/5000. Time: 882.9089212417603sec
2023-04-13 09:02:08 INFO: -   To sentence 4800/5000. Time: 920.7823030948639sec
2023-04-13 09:02:45 INFO: - Begin second Stage.
2023-04-13 09:02:46 INFO: - ***** Eval results inter-test *****
2023-04-13 09:02:46 INFO: -   f1 = 0.0
2023-04-13 09:02:46 INFO: -   f1_threshold = 0.0
2023-04-13 09:02:46 INFO: -   loss = 0.0
2023-04-13 09:02:46 INFO: -   precision = 0.0
2023-04-13 09:02:46 INFO: -   precision_threshold = 0.0
2023-04-13 09:02:46 INFO: -   recall = 0.0
2023-04-13 09:02:46 INFO: -   recall_threshold = 0.0
2023-04-13 09:02:46 INFO: -   span_f1 = 0.0
2023-04-13 09:02:46 INFO: -   span_p = 0.0
2023-04-13 09:02:46 INFO: -   span_r = 0.0
2023-04-13 09:02:46 INFO: -   type_f1 = 0.6400373598503717
2023-04-13 09:02:46 INFO: -   type_p = 0.6400373599003717
2023-04-13 09:02:46 INFO: -   type_r = 0.6400373599003717
2023-04-13 09:02:46 INFO: - 0.000,0.000,0.000,0.000,0.000,0.000,64.004,64.004,64.004,0.000,0.000,0.000
2023-04-13 09:02:46 INFO: - Best Valid F1: {'all': 0.0, 'type': 0.5703959773227126, 'span': -1.0}, Step: 100
2023-04-13 09:02:46 INFO: - Test F1: 0.0
2023-04-13 09:04:31 INFO: - Step: 120/5001, span loss = 0.000000, type loss = 1.232064, time = 1761.81s.
2023-04-13 09:06:14 INFO: - Step: 140/5001, span loss = 0.000000, type loss = 1.144472, time = 1864.17s.
2023-04-13 09:07:56 INFO: - Step: 160/5001, span loss = 0.000000, type loss = 1.272466, time = 1966.50s.
2023-04-13 09:09:36 INFO: - Step: 180/5001, span loss = 0.000000, type loss = 1.275529, time = 2066.86s.

I found metrics of type are correct, but metrics of span are all 0.0, so I stoped training. The eposide data downloaded from https://ningding97.github.io/fewnerd/ and didn't modify.

iofu728 commented 1 year ago

This is normal, DecomposedMetaNER decomposes the NER process into two stages: span and type, and trains them separately. The log you provided is in the type stage, where span acc is always equal to 0. Similarly, in the span stage, type acc is always equal to 0. You need to wait until both parts are trained before you can perform the final inference process.

fbotp commented 1 year ago

Thanks a lot!

microsoft / vert-papers

DecomposedMetaNER evaluate problem #66