microsoft / BioGPT

MIT License
4.26k stars 443 forks source link

Clarification on model output #46

Open ShilpaSangappa opened 1 year ago

ShilpaSangappa commented 1 year ago

We have tried running QA(non-large), document classification and the 3 RE models. All of them seem to have learned1, learned2,.... learned9 in the output. What do they stand for or is there any step that we are missing? We don't face this issue for text generation(non-large) though. e.g. re_bc5cdr output "It is critical to understand factors associated with nasopharyngeal carcinoma (NPC) metastasis. To track the evolutionary route of metastasis, here we perform an integrative genomic analysis of 163 matched blood and primary, regional lymph node metastasis and distant metastasis tumour samples, combined with single-cell RNA-seq on 11 samples from two patients. The mutation burden, gene mutation frequency, mutation signature, and copy number frequency are similar between metastatic tumours and primary and regional lymph node tumours. There are two distinct evolutionary routes of metastasis, including metastases evolved from regional lymph nodes (lymphatic route, 61.5%, 8 / 13) and from primary tumours (hematogenous route, 38.5%, 5 / 13). learned1 learned2 learned3 learned4 learned5 learned6 learned7 learned8 learned9 the relation between cisplatin and NPC exists; the relation between cisplatin and metastasis exists;."

sockthem commented 1 year ago

what parameters were u using for the QA model? any SS?

ShilpaSangappa commented 1 year ago
import torch
from BioGPT.src.transformer_lm_prompt import TransformerLanguageModelPrompt

data="question: Does histologic chorioamnionitis correspond to clinical chorioamnionitis? context: To evaluate the degree to which histologic chorioamnionitis, a frequent finding in placentas submitted for histopathologic evaluation, correlates with clinical indicators of infection in the mother. A retrospective review was performed on 52 cases with a histologic diagnosis of acute chorioamnionitis from 2,051 deliveries at University Hospital, Newark, from January 2003 to July 2003. Third-trimester placentas without histologic chorioamnionitis (n = 52) served as controls. Cases and controls were selected sequentially. Maternal medical records were reviewed for indicators of maternal infection. Histologic chorioamnionitis was significantly associated with the usage of antibiotics (p = 0.0095) and a higher mean white blood cell count (p = 0.018). The presence of 1 or more clinical indicators was significantly associated with the presence of histologic chorioamnionitis (p = 0.019)."

m = TransformerLanguageModelPrompt.from_pretrained(
        "/home/ubuntu/checkpoints/QA-PubMedQA-BioGPT",
        "checkpoint_avg.pt",
        "/home/ubuntu/BioGPT/data/PubMedQA/ansis-bin",
        tokenizer='moses',
        bpe='fastbpe',
        bpe_codes="/home/ubuntu/BioGPT/data/bpecodes",
        min_len=100,
        max_len_b=1024)

src_tokens = m.encode(data)
#print(f"SRC_TOKENS : {src_tokens}")
generate = m.generate([src_tokens], beam=5)[0]
#print(f"GENERATE[0] : {generate[0]['tokens']}")
output = m.decode(generate[0]["tokens"])
rajkumar-surana commented 1 year ago

any update you got on the issue? and how do you generate "..data/PubMedQA/ansis-bin" ? I got following on your data question: Does histologic chorioamnionitis correspond to clinical chorioamnionitis? context: To evaluate the degree to which histologic chorioamnionitis, a frequent finding in placentas submitted for histopathologic evaluation, correlates with clinical indicators of infection in the mother. A retrospective review was performed on 52 cases with a histologic diagnosisof acute chorioamnionitis from 2,051 deliveries at University Hospital, Newark, from January 2003 to July 2003. Third-trimester placentas without histologic chorioamnionitis (n = 52) served as controls. Cases and controls were selected sequentially. Maternal medical records were reviewed for indicators of maternal infection. Histologic chorioamnionitis was significantly associated with the usage of antibiotics (p = 0.0095) and a higher mean white blood cell count (p = 0.018). The presence of 1 or more clinical indicators was significantly associated with the presence of histologic chorioamnionitis (p = 0.019). learned1 learned2 learned3 learned4 learned5 learned6 learned7 learned8 learned9 the answer to the question given the context is no.

lir0ni commented 11 months ago

@ShilpaSangappa If any of you had any findings about it, please share :) I'm getting the same response from BC5CDR for relation extraction