ncbi-nlp / bluebert

BlueBERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III).
https://arxiv.org/abs/1906.05474
Other
558 stars 78 forks source link

Input data format for Named Entity Recognition #25

Closed CCYChongyanChen closed 4 years ago

CCYChongyanChen commented 4 years ago

Hi, thank you for sharing the code! I am trying to run Named Entity Recognition task but I didn't find the "train.tsv" or "devel.tsv" in the BC5CDR dataset. Instead, the train/devel/test data are in ".txt" format. If I change the '.txt' directly to ".tsv" and run, it shows keyerror:'clonidine.'

Could you tell me what exactly input format NER task needs? It will be greater if you can share the preprocessing code given title and abstract. Thank you in advance

yfpeng commented 4 years ago

You need to use the bert version at https://github.com/ncbi-nlp/BLUE_Benchmark/releases/download/0.1/bert_data.zip