Open cmungall opened 3 years ago
As a first pass, just hardcode ENVO for all 3 fields regardless of package
Then for next pass, we will have a curated configuration file like this:
-
field: env_broad_scale
packages:
- soil
termsets:
- ontology: envo
branches:
- ENVO:01000254 ## environment system
exclude_descendants_of:
- ENVO:01001788 ## marine ecosystem
-
field: env_local_scale
package: host-associated
termsets:
- ontology: UBERON
...
that will customize which ontologies are used where
Just an FYI, OGER does not have a PyPI release either.
@cmungall , how do you envision the input file coming in for NER to look like: A tsv file within the project (locally i.e. ./text_mining/data/input
) or remotely located (url) ?
I'm guessing the input tsv (or db) will be generated by @turbomam through his parsing work from the large XML?
I answered @hrshdhgd's questions on our 1-on1. It's clear now that he doesn't have to worry about formats, the goal is to implement functionality within the python framework all you care about is datamodel
https://github.com/cmungall/sample-annotator/tree/main/sample_annotator/text_mining
To start with, parse
sample['description']
, to populatesample['env_{broad_scale,local_scale,medium}']
if they are not already populatedI think this should be done by calling runner, but will need a pypi release https://github.com/monarch-initiative/runner/issues/9
or is it easier to just wrap oger directly for now
also for now we could just check in the nodes.tsv directly. See how we include mixs.json within the package
for now, be conservative and only use labels or exact synonyms