Closed mnhcorp closed 3 years ago
the model you downloaded is a pretrained model, not a NER model. If you want to repeat our results, you have to tell me which dataset you are using so we can provide the model accordingly. If you want to develop a model on your own dataset, then you need to train a NER model using your own training data first then do prediction/evaluation. During the training, we will create label2idx.json based on your data.
We trained the ner model on 2010i2b2, 2012 i2b2 and 2018 n2c2 datasets. If your dataset is not one of them, you have to train your own NER model using the training module.
Understood, thank you.
Can you provide a link to the NER model trained with the 2018 n2c2 dataset?
For the 2018 n2c2, we trained 3 NER models for drug + drug attributes, reason, and ADE, respectively then combine the results together into brat format for evaluation due to the fact that reason and ADE have overlapped annotated entities. If you want to use our models, you have to preprocess your data accordingly.
It will take some time for me to upload the models to amazon s3 then create the download links. I will upload all the models in the next few days when I have time.
Hi,
Thanks - don't want to take up a lot of your time. Perhaps I can try to generate my own training data and run the scripts.
Quick question: Is there an easy way to generate training data (from free-form text) in the IOB format, as specified in the test_data folder?
Thanks Again.
For IOB generate from BRAT data, you can do https://github.com/nlplab/brat/blob/master/tools/anntoconll.py. Note this script may not be able to generate position information as we did. But I think the IOB generated should be able to be trained and tested with our package. But you may not be able to convert the model output back to BRAT. I have not carefully investigated the anntoconll.py, but the idea is that there should be open-source solutions online.
We are working on a tutorial to demonstrate data generation and training currently. It should come out in the next few weeks.
Thank you.
Will look forward to the tutorials.
- For the 2018 n2c2, we trained 3 NER models for drug + drug attributes, reason, and ADE, respectively then combine the results together into brat format for evaluation due to the fact that reason and ADE have overlapped annotated entities. If you want to use our models, you have to preprocess your data accordingly.
- It will take some time for me to upload the models to amazon s3 then create the download links. I will upload all the models in the next few days when I have time.
Hello! Have you uploaded the NER model for the n2c2 dataset? Where can I find it? Thanks.
Hi,
Trying to run a batch prediction as such:
Running into this error:
I've downloaded the pre-trained BERT base + MIMIC model from here:
https://transformer-models.s3.amazonaws.com/mimiciii_bert_10e_128b.zip
I don't see label2idx.json present after extracting the archive:
Any help would be much appreciated. Thanks for your project!