vsm / nlp-to-vsm

From NLP dependency structures to semantics in VSM
https://vsm.github.io/nlp-to-vsm
GNU Affero General Public License v3.0
5 stars 0 forks source link

How to combine Structure vs. Identifier mapping? #2

Closed stcruy closed 3 years ago

stcruy commented 3 years ago

Moved here from an email:

–> Question:

Enju is outputting sentence structure information in a tree data structure, but I don't think it maps to ontology terms.
So, I guess you need an additional mapping step or combine this step with the translation one.
So, how do you see the hackathon running?

–> Reply:

I think that initially, structure-mapping and identifier-mapping can be separate subprojects.

A structure mapper could simply use term strings as placeholder IDs (or string+position, to prevent duplicates). Using URIs is not obligatory for VSM-JSON.

Indeed, multiple NLP tools will need to be brought together. (Todo: add to Readme). ◦ For example on www.pubannotation.org, we could use both the "Gene name grounding (PubTator)" and "Semantic annotation (MetaMap and SemRep)" annotations, which are shown aligned with the "Dependency parsing (Enju)". ◦ On stanza.run/bio, NER annotations are aligned with the UD tree. Though I only see term categories, no IDs, so PubAnnotation's site wins there. Of course, we may combine NLP tools from any ecosystem. – I like Enju because I see arg1Of, arg2Of relations. Perhaps these can be mapped to VSM subject/object relations (=tridents and bidents)? I like Universal Dependencies because it says 'Universal'. :) So here is where I'd prefer NLP experts to chime in. Hence my hackathon participation.

Back to the IDs: Realistically, I think our best target will be to generate a combination of as many as possible terms linked to an ontology/PubDictionaries/etc. ID, and the remainder of terms linked to a placeholder string-ID.

stcruy commented 3 years ago

–> Moved to Discussions.