vsm / nlp-to-vsm

From NLP dependency structures to semantics in VSM
https://vsm.github.io/nlp-to-vsm
GNU Affero General Public License v3.0
5 stars 0 forks source link

nlp-to-vsm

Converting NLP dependency trees, to semantic structures in VSM.

This is a project for the 7th Biomedical Linked Annotation Hackathon (BLAH7), with a focus on Covid-19 literature.

Demo of work-in-progress: vsm.github.io/nlp-to-vsm.

Overview:

Presentation video:

Goal

We want to map the dependency trees of NLP, onto the intuitive semantic structures of VSM (Visual Syntax Method).

"Covid-19" is the theme of the hackathon, so we would like to:
• focus on Covid-19 literature,
• design how Covid-19-related syntax trees can be converted to VSM semantics,
• discuss how this can facilitate semantic queries over multidisciplinary Covid-19 knowledge, in a form that biomedical end-users can grasp.

About VSM

VSM is an intuitive way to represent any contextualized knowledge on any topic, in a form that is semantically precise and also easy to understand. Biologists and others often tell they find VSM intuitive because it shows computable knowledge in a form close to "how they think".

VSM is both a knowledge representation and an interface for entering, editing, or simply displaying knowledge in this form – the latter is the use-case for this project.
All info on VSM is on vsm.github.io,  see esp. the vsm-box demo and the causalBuilder use case.

When context-rich knowledge is represented in VSM,
• it can be placed in a vsm-box user-interface, where it becomes easy to read, edit, or correct by biologists – which may serve as feedback for NLP;
• the knowledge is then formalized and queryable – via vsm-to-rdf.


Concrete project

Project option 1:  Enju –> VSM

We could use the output of the Enju deep parser (Covid-19 corpus coming soon), and design rules (or apply ML) to convert this to VSM-structures.
This means that:

Covid-19 literature spans across multiple biological scales, and covers multiple research areas. Nevertheless, knowledge from any of these areas can be represented in the same, quick-to-understand semantic form of VSM (second image below).

For example this Enju output (from PubAnnotation.org, bottom image), on a subject within molecular biology :

could be transformed into a VSM-sentence like this:


This VSM-sentence corresponds to the JSON below:  (or go here to see it as an interactive, editable VSM-sentence):

{ terms: [
    { str: 'production of', classID: 'http://purl.obolibrary.org/obo/GO_1903409', style: 'i11-13', instID: null },
    { str: 'ROI', classID: 'http://purl.obolibrary.org/obo/CHEBI_26523', instID: null },
    { str: 'and', classID: null, instID: null },
    { str: 'activation of', classID: 'http://purl.obolibrary.org/obo/MI_2235', style: 'i10-13', instID: null },
    { str: 'NF-kappaB', classID: 'https://www.alliancegenome.org/gene/HGNC:7794', instID: null },
    { str: 'could', classID: null, instID: null },
    { str: 'be blocked by', classID: null, instID: null, descr: '=inverse of \'blocks\' or \'blocking activity\'' },
    { str: 'antioxidant', classID: null, instID: null },
    { str: 'or', classID: null, instID: null },
    { str: 'FLAP Inhibitor', classID: null, instID: null }
  ],
  conns: [
    { type: 'T', pos: [ -1, 0, 1 ] },
    { type: 'T', pos: [ -1, 3, 4 ] },
    { type: 'L', pos: [ 2, 0, 3 ] },
    { type: 'T', pos: [ 6, -1, 5 ] },
    { type: 'L', pos: [ 8, 7, 9 ] },
    { type: 'T', pos: [ 2, 6, 8 ] }
  ]
}


Project option 2:  UD –> VSM

We could also start from dependency structures in the UD form (Universal Dependencies), and design rules to convert this to VSM.

For example the Stanza parser can produce UD output, also for biomedical and clinical text.
This can be tested live on stanza.run/bio : press Submit, and see the UD output and other useful information visualized at the bottom of the page.
The Stanza UD output for the earlier example is:


This project is particularly looking for:

Further ideas

Once we can translate NLP output to VSM, we can automatically convert the VSM to RDF (among others), and store it in a triplestore.

A necessary ambition

This project is quite ambitious and may eventually become the topic of an X-people-year project. Therefore we want to emphasize that our main goal – apart from some preliminary coding – is to initiate discussions and intellectual cross-pollinations towards achieving these goals.

Q&A

Questions & answers and further thoughts are collected in GitHub Discussions.