melvelet / transformers-interpret

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
Apache License 2.0
0 stars 0 forks source link

Find problems in finetuning #11

Closed melvelet closed 2 years ago

melvelet commented 2 years ago

Roberta - BC5CDR:

16 Fam, B-Chemical 17 ot, I-Chemical 18 idine, I-Chemical 19 is, O 20 a, O 21 hist, B-Chemical (Gold label: O) 22 amine, I-Chemical (Gold label: O)

37 stress, B-Disease (Gold label: O) 38 ul, I-Disease (Gold label: B-Disease) 39 cers, I-Disease

(gold labels confirmed by looking into un-processed dataset)

melvelet commented 2 years ago

Many of the FP/FNs are repeated multiple times in the same document. Many instances should imo be labeled in the dataset but aren't, often the prediction includes additional related tokens that the gold labels don't label as part of the entity but could arguably part of it. The model regularly removes dashes from the entity and splits it.

47 between, O 48 psychiatric, B-Disease (Gold label: O) 49 co, O 50 -, O 51 mor, O 52 bid, O 53 ity, O

189 major, B-Disease (Gold label: O) 190 depressive, I-Disease (Gold label: B-Disease) 191 disorder, I-Disease

275 incidence, O 276 of, O 277 bladder, B-Disease (Gold label: O) 278 instability, O 279 during, O

362 ed, B-Disease 363 ema, I-Disease 364 ,, O 365 vas, B-Disease (Gold label: O) 366 od, I-Disease (Gold label: O) 367 ilation, I-Disease (Gold label: O)

43 b, B-Chemical 44 ort, I-Chemical 45 )-, I-Chemical (Gold label: O) 46 d, I-Chemical (Gold label: B-Chemical) 47 ex, I-Chemical 48 am, I-Chemical 49 eth, I-Chemical

77 anxiety, B-Disease 78 -, O 79 like, I-Disease (Gold label: O) 80 behavior, I-Disease (Gold label: O)

5 Solid, B-Chemical (Gold label: O) 6 ago, I-Chemical (Gold label: O) 7 vir, I-Chemical (Gold label: O) 8 ga, I-Chemical (Gold label: O) 9 ure, I-Chemical (Gold label: O) 10 a, I-Chemical (Gold label: O) 11 extract, I-Chemical (Gold label: O)

50 Greek, B-Disease (Gold label: O) 51 My, I-Disease (Gold label: B-Disease) 52 el, I-Disease 53 oma, I-Disease

45 al, B-Chemical (Gold label: O) 46 ky, I-Chemical (Gold label: O) 47 l, I-Chemical (Gold label: O) 48 ating, I-Chemical (Gold label: O) 49 nitrogen, I-Chemical (Gold label: B-Chemical) 50 mustard, I-Chemical (Gold label: O)

76 inter, B-Disease (Gold label: O) 77 stitial, O 78 fib, B-Disease 79 rosis, I-Disease 80 and, O 81 tub, B-Disease (Gold label: O) 82 ular, I-Disease (Gold label: O) 83 atro, I-Disease (Gold label: B-Disease)

126 g, B-Disease (Gold label: O) 127 amm, I-Disease (Gold label: O) 128 ap, I-Disease (Gold label: O) 129 athy, I-Disease (Gold label: O)

10 HIV, B-Disease (Gold label: O) 11 -, I-Disease (Gold label: O) 12 positive, I-Disease (Gold label: O)

265 kidney, B-Disease (Gold label: O) 266 synd, I-Disease (Gold label: O) 267 rom, I-Disease (Gold label: O) 268 es, I-Disease (Gold label: O)

melvelet commented 2 years ago

Electra - Scai Disease

18 infections, B-DISEASE (Gold label: O) BUT 281 infections, B-DISEASE (infections not labeled in one instance and labeled within the same document)

206 skin, B-DISEASE (Gold label: O) 207 and, O 208 skin, B-DISEASE (Gold label: O) 209 structure, I-DISEASE (Gold label: O) 210 infections, I-DISEASE (Gold label: O)

258 healthcare, B-DISEASE (Gold label: O) 259 associated, I-DISEASE (Gold label: O) 260 pneumonia, I-DISEASE (Gold label: B-DISEASE)

40 age, B-DISEASE (Gold label: O) 41 -, I-DISEASE (Gold label: O) 42 related, I-DISEASE (Gold label: O) 43 dementia, I-DISEASE (Gold label: B-DISEASE)

Doc 2

101 es, B-DISEASE 102 ##op, I-DISEASE 103 ##ha, I-DISEASE 104 ##git, I-ADVERSE (Gold label: I-DISEASE) 105 ##is, I-ADVERSE (Gold label: I-DISEASE)

31 cop, B-DISEASE 32 ##d, I-DISEASE 33 ex, O (Gold label: I-DISEASE) 34 ##ace, O (Gold label: I-DISEASE) 35 ##rba, O (Gold label: I-DISEASE) 36 ##tions, O (Gold label: I-DISEASE)

422 pneumonia, B-DISEASE (Gold label: B-ADVERSE) (suddenly adverse, not disease?)

181 all, B-DISEASE (Gold label: O) 182 ##og, I-DISEASE (Gold label: O) 183 ##raf, I-DISEASE (Gold label: O) 184 ##t, I-DISEASE (Gold label: O) 185 tumors, I-DISEASE (Gold label: B-DISEASE)

283 men, B-ADVERSE 284 ##st, I-DISEASE (Gold label: I-ADVERSE) 285 ##ru, I-ADVERSE 286 ##al, I-ADVERSE 287 side, I-ADVERSE 288 effects, I-DISEASE (Gold label: I-ADVERSE)

92 ref, B-DISEASE (Gold label: O) 93 ##rac, I-DISEASE (Gold label: O) 94 ##tory, I-DISEASE (Gold label: O) 95 /, I-DISEASE (Gold label: O) 96 re, I-DISEASE (Gold label: O) 97 ##la, I-DISEASE (Gold label: O) 98 ##pse, I-DISEASE (Gold label: O) 99 ##d, I-DISEASE (Gold label: O) 100 cd, I-DISEASE (Gold label: O)

melvelet commented 2 years ago

Datasert EUADR: Empty entities:

{'id': '8316', 'type': 'Chemicals & Drugs', 'text': ['3-oxo-1,1-diphenyl-tetrahydro-oxazolo[3,4-a]pyrazine-7-carboxylic acid 4-fluoro-benzylamide'], 'offsets': [[62, 153]], 'normalized': []} {'id': '8317', 'type': 'Chemicals & Drugs', 'text': ['SHA 68'], 'offsets': [[155, 161]], 'normalized': []} {'id': '8318', 'type': 'Genes & Molecular Sequences', 'text': ['neuropeptide S receptor'], 'offsets': [[194, 217]], 'normalized': []} {'id': '8319', 'type': 'Genes & Molecular Sequences', 'text': ['Neuropeptide S'], 'offsets': [[219, 233]], 'normalized': []} {'id': '8320', 'type': 'Genes & Molecular Sequences', 'text': ['NPS'], 'offsets': [[235, 238]], 'normalized': []} {'id': '8321', 'type': 'Chemicals & Drugs', 'text': ['SHA 66'], 'offsets': [[477, 483]], 'normalized': []} {'id': '8322', 'type': 'Chemicals & Drugs', 'text': ['3-oxo-1,1-diphenyl-tetrahydro-oxazolo[3,4-a]pyrazine-7-carboxylic acid benzylamide'], 'offsets': [[485, 567]], 'normalized': []} {'id': '8323', 'type': 'Chemicals & Drugs', 'text': ['SHA 68'], 'offsets': [[573, 579]], 'normalized': []} {'id': '8324', 'type': 'Chemicals & Drugs', 'text': ['3-oxo-1,1-diphenyl-tetrahydro-oxazolo[3,4-a]pyrazine-7-carboxylic acid 4-fluoro-benzylamide'], 'offsets': [[581, 672]], 'normalized': []} {'id': '8325', 'type': 'Genes & Molecular Sequences', 'text': ['NPS receptor'], 'offsets': [[752, 764]], 'normalized': []} {'id': '8326', 'type': 'Genes & Molecular Sequences', 'text': ['NPSR'], 'offsets': [[766, 770]], 'normalized': []} {'id': '8327', 'type': 'Genes & Molecular Sequences', 'text': ['NPS'], 'offsets': [[793, 796]], 'normalized': []} {'id': '8328', 'type': 'Chemicals & Drugs', 'text': ['SHA 68'], 'offsets': [[828, 834]], 'normalized': []} {'id': '8329', 'type': 'Genes & Molecular Sequences', 'text': ['NPSR'], 'offsets': [[865, 869]], 'normalized': []} {'id': '8330', 'type': 'Chemicals & Drugs', 'text': ['SHA 68'], 'offsets': [[923, 929]], 'normalized': []} {'id': '8331', 'type': 'Genes & Molecular Sequences', 'text': ['G protein-coupled receptors'], 'offsets': [[1004, 1031]], 'normalized': []} {'id': '8332', 'type': 'Chemicals & Drugs', 'text': ['SHA 68'], 'offsets': [[1075, 1081]], 'normalized': []} {'id': '8333', 'type': 'Chemicals & Drugs', 'text': ['SHA 68'], 'offsets': [[1244, 1250]], 'normalized': []} {'id': '8334', 'type': 'Genes & Molecular Sequences', 'text': ['NPS'], 'offsets': [[1297, 1300]], 'normalized': []} {'id': '8335', 'type': 'Chemicals & Drugs', 'text': ['SHA 68'], 'offsets': [[1386, 1392]], 'normalized': []} {'id': '8336', 'type': 'Genes & Molecular Sequences', 'text': ['NPS'], 'offsets': [[1494, 1497]], 'normalized': []} !!! {'id': '8337', 'type': '', 'text': ['3-oxo-1,1-diphenyl-tetrahydro-oxazolo[3,4-a]pyrazine-7-carboxylic acid 4-fluoro-benzylamide'], 'offsets': [[62, 153]], 'normalized': []} !!! {'id': '8338', 'type': '', 'text': ['neuropeptide S receptor'], 'offsets': [[194, 217]], 'normalized': []} !!! {'id': '8340', 'type': '', 'text': ['SHA 68'], 'offsets': [[155, 161]], 'normalized': []} !!! {'id': '8341', 'type': '', 'text': ['neuropeptide S receptor'], 'offsets': [[194, 217]], 'normalized': []} !!! {'id': '8343', 'type': '', 'text': ['3-oxo-1,1-diphenyl-tetrahydro-oxazolo[3,4-a]pyrazine-7-carboxylic acid 4-fluoro-benzylamide'], 'offsets': [[581, 672]], 'normalized': []} !!! {'id': '8344', 'type': '', 'text': ['NPSR'], 'offsets': [[766, 770]], 'normalized': []} !!! {'id': '8346', 'type': '', 'text': ['3-oxo-1,1-diphenyl-tetrahydro-oxazolo[3,4-a]pyrazine-7-carboxylic acid 4-fluoro-benzylamide'], 'offsets': [[581, 672]], 'normalized': []} !!! {'id': '8347', 'type': '', 'text': ['NPS receptor'], 'offsets': [[752, 764]], 'normalized': []} !!! {'id': '8349', 'type': '', 'text': ['SHA 66'], 'offsets': [[477, 483]], 'normalized': []} !!! {'id': '8350', 'type': '', 'text': ['NPSR'], 'offsets': [[766, 770]], 'normalized': []} !!! {'id': '8352', 'type': '', 'text': ['SHA 66'], 'offsets': [[477, 483]], 'normalized': []} !!! {'id': '8353', 'type': '', 'text': ['NPS receptor'], 'offsets': [[752, 764]], 'normalized': []} !!! {'id': '8355', 'type': '', 'text': ['3-oxo-1,1-diphenyl-tetrahydro-oxazolo[3,4-a]pyrazine-7-carboxylic acid benzylamide'], 'offsets': [[485, 567]], 'normalized': []}