soummyaah / KGMedNLI

A repository containing the code for the paper "Incorporating Domain Knowledge into Medical NLI using Knowledge Graphs" EMNLP 2019
Apache License 2.0
13 stars 2 forks source link

Triples extraction/relations identification #1

Open sandeepsingh opened 4 years ago

sandeepsingh commented 4 years ago

Wanted to know the process of extracting triples from MedNLI dataset, how did you guys do that? i dont see a code or process in the paper.

Especially this part from the paper: The relations in our knowledge graph come from two sources: The Metathesaurus and the Semantic Network of UMLS. Using relations extracted from these two sources, we connect the filtered medical concepts from UMLS to build a smaller Knowledge Graph (subgraph of UMLS).

Also can you share the triples data metioned in the paper: We get 117,467 triples from the Metathesaurus and 23,824,105 triples from the Semantic Network, How did you guys get this triples?

Thanks,

deekshaVarshney commented 3 years ago

Yes, I would also like to know how did you guys get this triples?

sandeepsingh commented 3 years ago

Yes, I would also like to know how did you guys get this triples?

have a look at this.
https://www.ncbi.nlm.nih.gov/books/NBK9679/ https://skr3.nlm.nih.gov/SemMedDB/dbinfo.html

soummyaah commented 3 years ago

Hi all,

Thank you for the interest in the paper. I hope my answer will provide you with the information in detail. We generate the sub-graph of UMLS specific to the dataset i.e. we use the premise and hypothesis sentences in the MedNLI dataset and run a Named Entity Recognition tool specific to UMLS: MetaMap. MetaMap helps identify UMLS concepts referred to in the text. The input to the metamap is thus the premise and hypothesis sentences in the MedNLI dataset and output are the UMLS entities present in the sentences.

UMLS has two primary sources: The Metathesaurus and the Semantic Network. The Semantic Network contains categories for the concepts present in the UMLS Metathesaurus and provides a relationship between these Semantic categories. Using the Metathesaurus, we extract the entity and through the Semantic Network we get the category of the entity and thereby, we get the relations between the entities. This creates a relevant sub-KG of the entire UMLS KG.

While the triples data is not available right now, I hope this process helps. Please feel free to ask any further questions!

deekshaVarshney commented 3 years ago

For a text :: Hello doctor, I am suffering from coughing, throat infection from last week.

I am getting these kb tripes for the entity throat infection ::

[('throat infection', 'associated_with', 'suffering'), ('throat infection', 'result_of', 'suffering'), ('throat infection', 'associated_with', 'Coughing'), ('throat infection', 'co-occurs_with', 'suffering'), ('throat infection', 'manifestation_of', 'suffering'), ('throat infection', 'complicates', 'suffering'), ('throat infection', 'associated_with', 'gargle'), ('throat infection', 'occurs_in', 'suffering'), ('throat infection', 'process_of', 'suffering'), ('throat infection', 'degree_of', 'suffering'), ('throat infection', 'affects', 'suffering')]

Did you also get the same kind of triples?

saptarshi059 commented 3 years ago

Hi Deeksha,

I'm facing the same issue but I think I've made some headway in this. As Soumyah mentioned, UMLS provides 2 main knowledge sources Metathesaurus & Semantic N/w. If you see this (https://www.nlm.nih.gov/research/umls/implementation_resources/query_diagrams/er1.html) the Metathesaurus provides relations b/w concepts (CUI's). However, the issue is with the relationship labels (such as RO, RB, etc.) They aren't descriptive and you're thus left with RELA. Hence you can use RELA to connect concepts through the Metathesaurus and use Soumyah's advice for connecting them via the Semantic N/w. This is the best I could come up with.

I'm working on writing a tutorial for extracting triples in the way mentioned.