monarch-initiative / helpdesk

The Monarch Initiative Helpdesk
BSD 3-Clause "New" or "Revised" License
7 stars 0 forks source link

Help on deploying Monarch KG in our servers #58

Closed stimon closed 11 months ago

stimon commented 2 years ago

Dear team,

I'm not entirely sure if this is the right place to ask. Please point me to the right place/persons if it isn't.

My group and I want to exploit/integrate many features and data from the Monarch stack in the context of dementia diseases, but our production server is locked in a secure environment, leaving us without any possibility to reach external resources. This limitation forces us to try and deploy at least the KG part of Monarch.

Our approach consists in building our project ontology, which is basically Monarch's with a few other imports, integrating some of the ingest artifacts (we will keep animal models out for now), and loading the final KG in Neo4j.

My main doubt is about the transformation and loading. I thought at first that Scigraph was the way to go, but after looking at the different repos, it seems like it is not very much actively developed, and maybe I would have to learn about KGX and Koza. Am I right? How do you think we should approach this deployment?

It would be great to have specific documentation about how to reproduce the different parts of the Monarch stack. Still, I'm well aware of how limited resources are and the time it takes to make documentation :) (excuse me if it exists and I missed it!)

I really appreciate any help you can provide. Kind regards, Santi

putmantime commented 2 years ago

Hi Santi, We are indeed focusing all of our development efforts on the next generation monarch graph. Koza is our new ETL code that takes source files and transforms to the Biolink Model and outputs KGX formatted files. Unless you want to use Koza for your own ETL to transform and ingest new sources (which we highly recommend!) you will likely just need the KGX files for our graph. We host these on our google bucket.
The merged graph is here You can use KGX tools to generate a dump for a Neo4J endpoint.
Here is an example. https://github.com/biolink/kgx/blob/master/examples/scripts/load_csv_to_neo4j.py

The documentation that describes the data model and sources for the new Monarch KG is here

Does this address your use case? Happy to provide any more help necessary. Thanks so much for your interest in Monarch. Tim

stimon commented 2 years ago

Hi Tim,

Thank you so much for your help. I think you are pointing me in the right direction.

Our main objective is to include our clinical and research data in the graph to link clinical findings, biochemistry measurements, and genetic scores to disease/phenotypes or the corresponding entities. For this, I think we need to build the application ontology first instead of directly loading the KG artifact (we already have set up the imports and patterns using the Ontology Development Kit).
Maybe we could load our axioms and instance data separately, but we could potentially introduce inconsistencies by omitting the reasoning steps from the build. Unlikely, but still possible.

So I guess we can use KGX to generate the dump for our ontology (which imports monarch) and grab the ingest artifacts from other sources you already provided in the bucket. Am I correct? To transform our datasets, I think I will go first for a programmatic ad-hoc solution, maybe using Dipper.

If I have specific questions about KGX, Biolink, etc., should I ask in the issues at their repos? Or come back here?

Another question, out of curiosity, is regarding Scigraph. Is it still in the picture? Is it just launched over the built graph?

Thank you so much again for the help, Santi

putmantime commented 2 years ago

Happy to help and apologies for the delay. I thought my settings for this repo would notify me for comment replies but they have not been. Please continue your questions on this thread and tag me specifically with @putmantime which will hopefully help with that issue.

So I guess we can use KGX to generate the dump for our ontology (which imports monarch) and grab the ingest artifacts from other sources you already provided in the bucket. Am I correct?

That is correct and please let us know if we can help pint you in the right direction with the tooling.

SciGraph will no longer be part of the monarch stack with the new KGX based pipeline. We are handling many of the operations of scigraph upstream (e.g. closures for indexing, inferred association joins, rewiring etc will all happen by joining in those data from authoritative sources or pre-computed from standalone processes)

stimon commented 2 years ago

Thank you @putmantime I'm have been doing some reading/exploring, and I'm starting to get a clearer picture :)

I managed to get KGX working through the docker container (containerization is great for our use case, as we don't have many permissions to mess around in our servers) and tested a load into Neo4j.

So now, I will focus on the transformation step for our CSV source data, and I may try doing it with KOZA. I realized I was doing many similar things (like yaml-based column-term mappings) and better not reinvent the wheel. But before writing the code, I have to get a grasp of the Biolink model and fully understand how to align the parts of our ontology that aren't part of Monarch already to make a Biolink compliant ingest. I'm currently at it.

I will try to not bother you too much in here and come up with specific questions :)

Thanks again, Santi

stimon commented 2 years ago

Hi @putmantime!

I'm slowly making progress :) What is the best way to grab the sources? I tried setting up monarch-ingest on my Mac OSX following the instructions. Everything seems to install, but I get a zsh: command not found: ingest.

poetry install    
Creating virtualenv monarch-ingest-xVtWBxQY-py3.8 in /Users/myuser/Library/Caches/pypoetry/virtualenvs
Installing dependencies from lock file

Package operations: 157 installs, 0 updates, 0 removals

  • Installing six (1.16.0)
  • Installing hbreader (0.9.1)
  • Installing isodate (0.6.1)
  • Installing pyparsing (3.0.9)
  • Installing antlr4-python3-runtime (4.9.3)
  • Installing certifi (2022.5.18.1)
  • Installing charset-normalizer (2.0.12)
  • Installing idna (3.3)
  • Installing jsonasobj (2.0.1)
  • Installing pyasn1 (0.4.8)
  • Installing rdflib (6.1.1)
  • Installing urllib3 (1.26.9)
  • Installing zipp (3.8.0)
  • Installing attrs (21.4.0)
  • Installing cachetools (5.2.0)
  • Installing click (8.0.4)
  • Installing importlib-resources (5.7.1)
  • Installing markupsafe (2.1.1)
  • Installing mdurl (0.1.1)
  • Installing protobuf (4.21.1)
  • Installing pyasn1-modules (0.2.8)
  • Installing pyjsg (0.11.10)
  • Installing pyrsistent (0.18.1)
  • Installing pytz (2022.1)
  • Installing pyyaml (5.4.1)
  • Installing rdflib-jsonld (0.6.1)
  • Installing requests (2.27.1)
  • Installing rsa (4.8)
  • Installing wrapt (1.14.1)
  • Installing alabaster (0.7.12)
  • Installing babel (2.10.1)
  • Installing chardet (4.0.0)
  • Installing decorator (5.1.1)
  • Installing deprecated (1.2.13)
  • Installing docutils (0.16)
  • Installing google-auth (2.6.6)
  • Installing googleapis-common-protos (1.56.1)
  • Installing imagesize (1.3.0)
  • Installing importlib-metadata (4.11.4)
  • Installing jinja2 (3.1.2)
  • Installing json-flattener (0.1.9)
  • Installing jsonasobj2 (1.0.4)
  • Installing jsonpointer (2.3)
  • Installing jsonschema (4.6.0)
  • Installing markdown-it-py (2.1.0)
  • Installing packaging (21.3)
  • Installing ply (3.11)
  • Installing prefixcommons (0.1.9)
  • Installing pygments (2.12.0)
  • Installing rdflib-shim (1.0.3)
  • Installing ruamel.yaml.clib (0.2.6)
  • Installing shexjsg (0.8.2)
  • Installing snowballstemmer (2.2.0)
  • Installing sparqlwrapper (2.0.0)
  • Installing sphinxcontrib-devhelp (1.0.2)
  • Installing sphinxcontrib-applehelp (1.0.2)
  • Installing sphinxcontrib-htmlhelp (2.0.0)
  • Installing sphinxcontrib-jsmath (1.0.1)
  • Installing sphinxcontrib-qthelp (1.0.3)
  • Installing sphinxcontrib-serializinghtml (1.1.5)
  • Installing cfgraph (0.2.1)
  • Installing distlib (0.3.4)
  • Installing et-xmlfile (1.1.0)
  • Installing filelock (3.7.1)
  • Installing google-api-core (2.8.0)
  • Installing google-crc32c (1.3.0)
  • Installing greenlet (1.1.2)
  • Installing jsonpatch (1.32)
  • Installing jsonpath-ng (1.5.3)
  • Installing linkml-runtime (1.2.16)
  • Installing mdit-py-plugins (0.3.0)
  • Installing platformdirs (2.5.2)
  • Installing pyshexc (0.9.1)
  • Installing ruamel.yaml (0.17.21)
  • Installing sparqlslurper (0.5.1)
  • Installing sphinx (4.5.0)
  • Installing typing-extensions (4.2.0)
  • Installing appdirs (1.4.4)
  • Installing argparse (1.4.0)
  • Installing commonmark (0.9.1)
  • Installing elastic-transport (8.1.2)
  • Installing google-cloud-core (2.3.0)
  • Installing google-resumable-media (2.3.3)
  • Installing graphviz (0.20)
  • Installing iniconfig (1.1.1)
  • Installing linkml-dataops (0.1.0)
  • Installing mccabe (0.6.1)
  • Installing mypy-extensions (0.4.3)
  • Installing myst-parser (0.17.2)
  • Installing openpyxl (3.0.10)
  • Installing parse (1.19.0)
  • Installing pathspec (0.9.0)
  • Installing pluggy (1.0.0)
  • Installing pockets (0.9.1)
  • Installing py (1.11.0)
  • Installing pycodestyle (2.7.0)
  • Installing pydantic (1.9.1)
  • Installing pyflakes (2.3.1)
  • Installing pyshex (0.8.1)
  • Installing python-dateutil (2.8.2)
  • Installing regex (2022.6.2)
  • Installing sphinx-click (4.1.0)
  • Installing sqlalchemy (1.4.37)
  • Installing toml (0.10.2)
  • Installing tomli (2.0.1)
  • Installing virtualenv (20.14.1)
  • Installing watchdog (2.1.8)
  • Installing websocket-client (1.3.2)
  • Installing black (21.6b0)
  • Installing compress-json (1.0.7)
  • Installing deprecation (2.1.0)
  • Installing docker (5.0.3)
  • Installing elasticsearch (8.2.2)
  • Installing flake8 (3.9.2)
  • Installing ghp-import (2.1.0)
  • Installing google-cloud-storage (2.3.0)
  • Installing linkml (1.2.13)
  • Installing markdown (3.3.7)
  • Installing mergedeep (1.3.4)
  • Installing numpy (1.22.4)
  • Installing numpydoc (1.3.1)
  • Installing pbr (5.9.0)
  • Installing pytest (7.1.2)
  • Installing pyyaml-env-tag (0.1)
  • Installing recommonmark (0.7.1)
  • Installing sphinx-rtd-theme (0.4.3)
  • Installing sphinxcontrib-napoleon (0.7)
  • Installing stringcase (1.2.0)
  • Installing tox (3.25.0)
  • Installing tqdm (4.64.0)
  • Installing typer (0.4.0)
  • Installing bmt (0.8.4)
  • Installing ijson (3.1.4)
  • Installing jsonlines (3.0.0)
  • Installing jsonstreams (0.6.0)
  • Installing kghub-downloader (0.1.14)
  • Installing linkml-validator (0.3.0)
  • Installing mkdocs (1.3.0)
  • Installing mkdocs-material-extensions (1.0.3)
  • Installing mypy (0.960)
  • Installing neo4j (4.3.0)
  • Installing networkx (2.8.3)
  • Installing ordered-set (4.1.0)
  • Installing pandas (1.4.2)
  • Installing prologterms (0.0.6)
  • Installing pymdown-extensions (9.4)
  • Installing terminaltables (3.1.10)
  • Installing tox-docker (3.1.0)
  • Installing validators (0.19.0)
  • Installing autoflake (1.4)
  • Installing biolink-model-pydantic (0.1.11)
  • Installing cat-merge (0.1.15)
  • Installing isort (5.10.1)
  • Installing kgx (1.5.8)
  • Installing koza (0.1.14)
  • Installing mkdocs-material (8.3.1)
  • Installing monarch-gene-mapping (0.1.1)

Installing the current project: monarch-ingest (0.3.0)

I also tried make:

poetry run autoflake \
    --recursive \
    --remove-all-unused-imports \
    --remove-unused-variables \
    --ignore-init-module-imports \
    --in-place monarch_ingest tests
poetry run isort monarch_ingest tests
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/cli_utils.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/helper.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/main.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/omim/gene_to_disease.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/hpoa/disease_phenotype.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/ncbi/gene.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/xenbase/gene_to_phenotype.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/alliance/gene_to_expression.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/alliance/publication.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/alliance/gene_to_phenotype.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/dictybase/utils.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/dictybase/gene.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/dictybase/gene_to_phenotype.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/hgnc/gene.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/string/protein_links.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/panther/ref_genome_orthologs.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/pombase/gene_to_phenotype.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/zfin/gene_to_phenotype.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/reactome/pathway.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/ingests/reactome/chemical_to_pathway.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/util/dicty_phenotype_obo_to_map.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/monarch_ingest/model/biolink.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/tests/unit/hpoa/test_hpoa_disease_phenotype.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/tests/unit/xenbase/test_xenbase_gene_to_phenotype.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/tests/unit/alliance/test_alliance_publication.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/tests/unit/alliance/test_alliance_gene_to_phenotype.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/tests/unit/alliance/test_alliance_gene_to_expression.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/tests/unit/dictybase/test_dictybase_gene.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/tests/unit/dictybase/test_dictybase_gene_to_phenotype.py
Fixing /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest/tests/unit/pombase/test_pombase_gene_to_phenotype.py
poetry run black monarch_ingest tests
reformatted monarch_ingest/helper.py
reformatted monarch_ingest/ingests/dictybase/gene.py
reformatted monarch_ingest/ingests/dictybase/gene_to_phenotype.py
reformatted monarch_ingest/ingests/hpoa/genes_to_phenotype.py
reformatted monarch_ingest/ingests/alliance/gene_to_phenotype.py
reformatted monarch_ingest/ingests/alliance/gene_to_expression.py
reformatted monarch_ingest/ingests/reactome/chemical_to_pathway.py
reformatted monarch_ingest/ingests/hgnc/gene.py
reformatted monarch_ingest/ingests/mgi/publication_to_gene.py
reformatted monarch_ingest/ingests/pombase/gene_to_phenotype.py
reformatted monarch_ingest/ingests/pombase/gene.py
reformatted monarch_ingest/ingests/panther/ref_genome_orthologs.py
reformatted monarch_ingest/ingests/dictybase/utils.py
reformatted monarch_ingest/ingests/hpoa/disease_phenotype.py
reformatted monarch_ingest/ingests/reactome/gene_to_pathway.py
reformatted monarch_ingest/ingests/panther/orthology_utils.py
reformatted monarch_ingest/ingests/string/protein_links.py
reformatted tests/unit/hpoa/test_hpoa_disease_phenotype.py
reformatted tests/unit/hpoa/test_hpoa_genes_to_phenotype.py
reformatted monarch_ingest/cli_utils.py
reformatted tests/unit/reactome/test_reactome_pathway.py
reformatted tests/unit/dictybase/test_dictybase_gene_to_phenotype.py
reformatted tests/unit/dictybase/test_dictybase_gene.py
reformatted monarch_ingest/main.py
reformatted tests/unit/omim/test_omim_gene_disease.py
reformatted tests/unit/alliance/test_alliance_publication.py
reformatted tests/unit/xenbase/test_xenbase_gene_to_phenotype.py
reformatted tests/unit/panther/test_ref_genome_orthologs.py
reformatted monarch_ingest/model/biolink.py
All done! ✨ 🍰 ✨
29 files reformatted, 43 files left unchanged.
poetry run python -m pytest --ignore=ingest_template
================================================================================================= test session starts ==================================================================================================
platform darwin -- Python 3.8.2, pytest-7.1.2, pluggy-1.0.0
rootdir: /Volumes/Work/development/ddi-kg/ontologies/monarch/monarch-kg/ingests/monarch-ingest
collected 107 items                                                                                                                                                                                                    

tests/unit/alliance/test_alliance_gene.py ......                                                                                                                                                                 [  5%]
tests/unit/alliance/test_alliance_gene_to_expression.py ......                                                                                                                                                   [ 11%]
tests/unit/alliance/test_alliance_gene_to_phenotype.py ..                                                                                                                                                        [ 13%]
tests/unit/alliance/test_alliance_publication.py ....................                                                                                                                                            [ 31%]
tests/unit/ctd/test_ctd_chemical_to_disease.py ...                                                                                                                                                               [ 34%]
tests/unit/dictybase/test_dictybase_gene.py .......                                                                                                                                                              [ 41%]
tests/unit/dictybase/test_dictybase_gene_to_phenotype.py .....                                                                                                                                                   [ 45%]
tests/unit/flybase/test_flybase_publication_to_gene.py .                                                                                                                                                         [ 46%]
tests/unit/goa/test_go_annotation.py ..                                                                                                                                                                          [ 48%]
tests/unit/hgnc/test_hgnc_gene.py ...                                                                                                                                                                            [ 51%]
tests/unit/hpoa/test_hpoa_disease_phenotype.py .                                                                                                                                                                 [ 52%]
tests/unit/hpoa/test_hpoa_genes_to_phenotype.py ..                                                                                                                                                               [ 54%]
tests/unit/mgi/test_mgi_publication_to_gene.py ..                                                                                                                                                                [ 56%]
tests/unit/ncbi/test_ncbi_gene.py ....                                                                                                                                                                           [ 59%]
tests/unit/omim/test_omim_gene_disease.py .....                                                                                                                                                                  [ 64%]
tests/unit/panther/test_ref_genome_orthologs.py ...............                                                                                                                                                  [ 78%]
tests/unit/pombase/test_pombase_gene.py ....                                                                                                                                                                     [ 82%]
tests/unit/pombase/test_pombase_gene_to_phenotype.py .                                                                                                                                                           [ 83%]
tests/unit/reactome/test_reactome_chemical_to_pathway.py .                                                                                                                                                       [ 84%]
tests/unit/reactome/test_reactome_gene_to_pathway.py .                                                                                                                                                           [ 85%]
tests/unit/reactome/test_reactome_pathway.py .                                                                                                                                                                   [ 85%]
tests/unit/rgd/test_rgd_publication_to_gene.py ..                                                                                                                                                                [ 87%]
tests/unit/sgd/test_sgd_publication_to_gene.py .                                                                                                                                                                 [ 88%]
tests/unit/string/test_string_protein_links.py ..                                                                                                                                                                [ 90%]
tests/unit/xenbase/test_xenbase_gene_to_phenotype.py ..                                                                                                                                                          [ 92%]
tests/unit/xenbase/test_xenbase_publication_to_gene.py ..                                                                                                                                                        [ 94%]
tests/unit/zfin/test_zfin_gene_to_phenotype.py .....                                                                                                                                                             [ 99%]
tests/unit/zfin/test_zfin_publication_to_gene.py .                                                                                                                                                               [100%]

================================================================================================= 107 passed in 9.98s ==================================================================================================
rm -rf `find . -name __pycache__`
rm -f `find . -type f -name '*.py[co]' `
rm -rf .pytest_cache
rm -rf dist

Am I overlooking something? or is there a missing step in the documentation?

Thanks!

putmantime commented 2 years ago

Hi Santi, Sorry but I may be missing the issue. What was the exact command that resulted in the command not found exception?

stimon commented 2 years ago

@putmantime Any of the commands I'm guessing ingest should be in the path after installing, but it isn't.

(ingest-env) (base) myuser@x86_64-apple-darwin13 monarch-ingest % ingest download 
zsh: command not found: ingest
stimon commented 1 year ago

Hi @putmantime

I haven't had much time to spend on this project, but I will try to give it a push now. I did manage to get the ingests, merge and load them into Neo4j using the tools. The next step is transforming our source tabulated data complying with the Biolink model, and I guess the best would be to directly use Koza and the Python dataclasses.

I have been looking at the examples and the actual monarch ingests. So I started with a simple subject and sex transformation:

import uuid
from biolink.model import Case, CaseToPhenotypicFeatureAssociation, PhenotypicSex
from koza.cli_runner import get_koza_app

source_name = 'ddi-subject'
koza_app = get_koza_app(source_name)
#columns = koza_app.source.config.columns
has_phenotype = 'biolink:has_phenotype'

row = koza_app.get_row()

# Subject case instance
subid = 'DDI:' + row['subject_label']
# just 'human' for now
subject = Case(id=subid, provided_by='DDI', category='NCBITaxon:9606')
koza_app.write(subject)

# Biological sex phenotype
female = 'PATO:0000383'
male = 'PATO:0000384'
sex = None
if row['gender'] == 'male':
    sex = PhenotypicSex(id=str(uuid.uuid1()), category=male, has_attribute_type='PATO:0001894') 
elif row['gender'] == 'female':
    sex = PhenotypicSex(id=str(uuid.uuid1()), category=female, has_attribute_type='PATO:0001894')

if sex is not None:
    case_phenotype = CaseToPhenotypicFeatureAssociation(
        id="uuid:" + str(uuid.uuid1()),
        subject=subject.id,
        object=sex,
        predicate=has_phenotype,
    )

    koza_app.write(sex, case_phenotype)

The examples are evident for high-level entities and associations (genes, proteins...), but I'm failing to find code for more low-level or fined-grained triples that also bring entities from external ontologies, like relating a subject node all the way from it, through clinical entities (such as health encounter) to a date literal to state a year of birth.

Do you know if such examples exist, or could you give some starting hints? Is it maybe better to follow another approach like generating source RDF and then transforming and merging with KGX?

Thanks! Santi

putmantime commented 1 year ago

Hi Santi, Apologies for the delay. It sounds like you want to model concepts and properties that are not part of the Biolink Model.

You could approach this in a few ways:

You could submit issues to the biolink repo and ask questions and for model additions there https://github.com/biolink/biolink-model

Or you could use the generic biolink:Association class to associate concepts that are modeled specifically in biolink.

You could write your own data classes that follow the basic kgx schema.

You could extend biolink classes like Association to have the attributes you need.

Let me know if I can explain or help further.

stimon commented 1 year ago

Hi @putmantime

Thanks for the answer. I have been stuck again with day-to-day work and had no chance to make much progress, but I'm trying to give this a push again.

I was under the impression that I was sort of forcing the hand with this approach, as our data is raw study data. So I'm now drifting to follow an "explicit" RDF transformation with Dipper and use Biolink for categorization and manually instantiating biolink:Association (as you say, most of them don't have an appropriate concept in the model).

I think that with this two-tiered approach, I can still store low-level statements while enabling higher-level, simpler facts from those statements and exploit it when I merge KGs from KG-hub.

For example, I'm now instantiating classes from OBO ontologies like OGMS/OBI/HPO/MONDO to express protocol_visit -> clinical history -> finding -> disease/phenotype. Then use biolink:Association to reificate and make direct patient-phenotype assertions, connecting with the previous context.

I'm finding it harder to do this for assay results, like ELISA measurements and mass spec results. At Biolink level I aim to connect the entity of interest (protein, cell count) from the patient with a specific value, like 570 pg/ml. It may not make sense for this case... It will be the same for neuroimaging-derived analyses, like Freesurfer segmentation.

Let me know if you find something wrong in my reasoning so that I can change paths again if that is the case :)

Best, Santi