Enju is outputting sentence structure information in a tree data structure, but I don't think it maps to ontology terms.
So, I guess you need an additional mapping step or combine this step with the translation one.
So, how do you see the hackathon running?
–> Reply:
I think that initially, structure-mapping and identifier-mapping can be separate subprojects.
A structure mapper could simply use term strings as placeholder IDs (or string+position, to prevent duplicates). Using URIs is not obligatory for VSM-JSON.
Indeed, multiple NLP tools will need to be brought together. (Todo: add to Readme).
◦ For example on www.pubannotation.org, we could use both the "Gene name grounding (PubTator)" and "Semantic annotation (MetaMap and SemRep)" annotations, which are shown aligned with the "Dependency parsing (Enju)".
◦ On stanza.run/bio, NER annotations are aligned with the UD tree. Though I only see term categories, no IDs, so PubAnnotation's site wins there.
Of course, we may combine NLP tools from any ecosystem. – I like Enju because I see arg1Of, arg2Of relations. Perhaps these can be mapped to VSM subject/object relations (=tridents and bidents)? I like Universal Dependencies because it says 'Universal'. :) So here is where I'd prefer NLP experts to chime in. Hence my hackathon participation.
Back to the IDs: Realistically, I think our best target will be to generate a combination of as many as possible terms linked to an ontology/PubDictionaries/etc. ID, and the remainder of terms linked to a placeholder string-ID.
Moved here from an email:
–> Reply:
I think that initially, structure-mapping and identifier-mapping can be separate subprojects.
A structure mapper could simply use term strings as placeholder IDs (or string+position, to prevent duplicates). Using URIs is not obligatory for VSM-JSON.
Indeed, multiple NLP tools will need to be brought together. (Todo: add to Readme). ◦ For example on www.pubannotation.org, we could use both the "Gene name grounding (PubTator)" and "Semantic annotation (MetaMap and SemRep)" annotations, which are shown aligned with the "Dependency parsing (Enju)". ◦ On stanza.run/bio, NER annotations are aligned with the UD tree. Though I only see term categories, no IDs, so PubAnnotation's site wins there. Of course, we may combine NLP tools from any ecosystem. – I like Enju because I see arg1Of, arg2Of relations. Perhaps these can be mapped to VSM subject/object relations (=tridents and bidents)? I like Universal Dependencies because it says 'Universal'. :) So here is where I'd prefer NLP experts to chime in. Hence my hackathon participation.
Back to the IDs: Realistically, I think our best target will be to generate a combination of as many as possible terms linked to an ontology/PubDictionaries/etc. ID, and the remainder of terms linked to a placeholder string-ID.