SciBERT - NER - Githubissues

ivalexander13 commented 3 years ago

Overview

We are doing this to compare SciBERT's performance on NER, relative to text classification. SciBERT didn't provide a chemprot dataset for NER, so we are using the chemprot dataset straight from its source (link here?) and formatting it to fit the model's NER task.

Attempt (ongoing)

We are in the middle of converting the source chemprot dataset, and doing part-of-speech tagging on each word, as well as connecting the relevant entities (substrate, product, and enzyme).

Plans

We will do the full 75 epoch training on this dataset, and see how it performs.

mrunalimanj commented 3 years ago

Sina and I got our data reformatted finally after a couple hours, in mar12_NER/20210326_set_up_NER_runs_with_dividers.ipynb -- data was saved to data/ner/chemprot_sub_enzyme/clean/{dev, train, test}.txt

We ran it yesterday but keep getting low f1s, so I'm going to start working on seeing if we can use bits and pieces of the SciBERT model to include class_weights - more coming

mrunalimanj commented 3 years ago

how we ran it for testing (didn't want to use compute hours): source activate /global/home/groups/fc_igemcomp/software/scibert_env_NER cd fc_igemcomp/2020_nlp/scibert rm -R scripts/NER_output_26mar/ ./scripts/train_allennlp_local_v3_NER_trial.sh ./scripts/NER_output_26mar/

mrunalimanj commented 3 years ago

creating new kernel: source activate ~/fc_igemcomp/software/scibert_env_NER # very important! # can also use conda ( had to install ipykernel: conda install -p /global/home/groups/fc_igemcomp/software/scibert_env_NER ipykernel) python -m ipykernel install --user --name python3.6.13_ner_scibert --display-name "Python 3.6.13 (scibert_env_NER)" # display name is what will show

6:20pm: issue with some iProgress module, so ran these

conda install -c conda-forge ipywidgets jupyter nbextension enable --py widgetsnbextension (cool! now TQDM works in-notebook)

mrunalimanj commented 3 years ago

okay, my plan: what I want to do:

get a trained model of maybe 10 epochs.
then get the logits out from the predict option of the finetuned model
and weight those accordingly to get better labels.

uguguguguugugg we need to modify the loss if we want the model to LEARN these weights though

train: scripts/0403_train_allennlp_local_NER_few_epochs.sh scripts/NER_output_3apr/

mrunalimanj commented 3 years ago

oof okay switching to local to make changes to AllenNLP - will try to set up similar structure of files on Savio and sync to GitHub

ah sike - we realized it's not bert_text_classifier it uses for the NER set, but rather the bert_crf_tagger.py file - will try to see if we can modify that to use class weights instead!

mrunalimanj commented 3 years ago

https://github.com/kmkurn/pytorch-crf/issues/47 is helpful, and files to modify include ner_finetune.json, allennlp CRF class, and the bert_crf_tagger.py file.

sghandian commented 3 years ago

did some more checks into how people have fixed imbalanced data issues in AllenNLP before. Seems like there is no generalized solution according to this thread.

Mrunali and my experiments with directly modifying the weights haven't made a big difference to performance so far, might be missing something though.

mrunalimanj commented 3 years ago

looking into modifying CRFs to be weighted: mathy paper that says basically we should compute a double sum for loss so we can weight the classes https://perso.uclouvain.be/michel.verleysen/papers/ieeetbe12gdl.pdf: seems to have kind of decent results? hadn't thought about L1 regularization.

from: https://github.com/allenai/allennlp/issues/4619

someone said "I mean, I believe it can work in practice, but their theoretical motivation is not correct. If this is the case, we could do it with a much simpler approach (like weighted emission scores)." which is what we did...: https://github.com/tensorflow/addons/issues/817

okay, I'm just going to keep a running list of updates in this comment on other comments/potential implementations

{in any case can you tell how much fun I'm having with GitHub issues lmao}

sghandian commented 3 years ago

This textbook chapter from my NLP class actually goes over what we have concluded as being a good approach to solving this problem which I thought was validating (i.e. NER/Relation Extraction + semi-supervised approach) https://web.stanford.edu/~jurafsky/slp3/17.pdf

ivalexander13 commented 3 years ago

This textbook chapter from my NLP class actually goes over what we have concluded as being a good approach to solving this problem which I thought was validating (i.e. NER/Relation Extraction + semi-supervised approach) https://web.stanford.edu/~jurafsky/slp3/17.pdf

Is the semi-supervised approach the approach you're/they're thinking of? It does seem really cool and it seems to have decent track record, though we'd probably need to rewrite a lot of code. Do you think this is something worth pursuing?

sghandian commented 3 years ago

Yeah take a look at 17.2.4 in there (distant supervision for relation extraction). It sounds very similar to the pattern recognition technique we've been talking about, except it learns non-regex patterns for features (or aggregates data to be fed into NN directly without extracting features beforehand). Problem is that it generally has low precision, which is similar to the other paper we read using pattern matching, so not sure what the best solution is for us.

mrunalimanj commented 3 years ago

Trying to rebalance the data (with 12apr/20210412 notebook + script) so as to remove any sentences without entities/labels of interest, but the F1 does not change considerably :(

mrunalimanj commented 3 years ago

praise Ivan who modified a hugging face implementation (in his scratch folder, /global/scratch/ivalexander13/NLPChemExtractor/scibert-text-classification/main.ipynb, but also in /global/home/groups/fc_igemcomp/2020_nlp/scibert/apr16_huggingface_NER)

hyper parameter search doc: https://docs.google.com/spreadsheets/d/1jolvSI9tCqHZqBMtX1MAUjht2WuXyl_uFauhbvHMUtQ/edit?usp=sharing TODO:
tune hyperparams
creation of a simpler model
- would need labeled data from pattern dev to have more freedom
see if modifying loss fn is necessary
relation extraction equivalent? may be more fitted to our problem, but potentially not a huge deal if we can't do it.
run for more epochs, see if that helps in training
modify brenda data to be used as training data? or potentially use for tagging, off of a semi-supervised model with chemprot data
- get labeled data from pattern development for some kind of benchmark?

mrunalimanj commented 3 years ago

revised TODOs:

HuggingFace NER:
1. try further regularization: dropout + early stopping - don't use loss, use F1 + AUC/ROC https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5 (Ivan) to pick the desired threshold, maybe if it's in the hugging face library?
2. look into playing with loss: weights? https://huggingface.co/transformers/training.html, https://github.com/huggingface/transformers/issues/7024
HuggingFace QA:
1. there's no good RE implementation in hugging face, so maybe QA is better? https://huggingface.co/transformers/usage.html - a brief test suggests maybe it's not the best with a normal BERT model but we can to integrate use of sciBERT

ivalexander13 commented 3 years ago

revised TODOs:

HuggingFace NER:

try further regularization: dropout + early stopping - don't use loss, use F1 + AUC/ROC https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5 (Ivan) to pick the desired threshold, maybe if it's in the hugging face library?

look into playing with loss: weights? https://huggingface.co/transformers/training.html, huggingface/transformers#7024

HuggingFace QA:

there's no good RE implementation in hugging face, so maybe QA is better? https://huggingface.co/transformers/usage.html - a brief test suggests maybe it's not the best with a normal BERT model but we can to integrate use of sciBERT

I'm working on this at #26

igematberkeley / NLPChemExtractor

SciBERT - NER #8

Overview

Attempt (ongoing)

Plans