uf-hobi-informatics-lab / 2019_N2C2_Track1_ClinicalSTS

source code for UFL team participated in 2019N2C2/OHNLP challenge Track-1 Clinical Sematic Text Similarity
MIT License
6 stars 2 forks source link

Code Snippet to use best RoBERTA model to evaluate 2 sentence pairs #2

Open griff4692 opened 2 years ago

griff4692 commented 2 years ago

Hi -

could you provide a code snippet for how to load the model weights from

https://transformer-models.s3.amazonaws.com/2019n2c2_tack1_roberta_pt_stsc_6b_16b_3c_8c.zip

into the Roberta model and then use for inference? Thanks

bugface commented 2 years ago
  1. download model and unzip
  2. for load model and do prediction, see https://github.com/uf-hobi-informatics-lab/2019_N2C2_Track1_ClinicalSTS/blob/master/src/single_task.py (line 738 - 748)
bugface commented 2 years ago

or just refer to https://github.com/uf-hobi-informatics-lab/2019_N2C2_Track1_ClinicalSTS/blob/master/single.sh

griff4692 commented 2 years ago

Thanks very much.

I assume that the code in model.py

https://github.com/uf-hobi-informatics-lab/2019_N2C2_Track1_ClinicalSTS/blob/master/src/model.py#L68

isn't used. One last question (if you don't mind). What is the input format to the tokenizer? Is it something like sent1 </s> sent2? I was hoping to get a sense without downloading the STS datasets and running preprocessing scripts.