obophenotype / brain_data_standards_ontologies

A repository for co-ordinating work on ontologies for the Brain Data Standards Project
Apache License 2.0
10 stars 3 forks source link

Switch to unified DOSDP template #97

Open dosumis opened 3 years ago

dosumis commented 3 years ago

The pipeline has grown overly complex. It could be simplified to run mostly through a unified DOSDP template allowing a range of variables to feed into automated definitions/label/synonyms etc.

The major dependency for this is an update to DOSDP + DOSDP_tools to allow list variables with 0-many cardinality + a templating system that can work with these. For disucssion of possible extensions see:

https://github.com/INCATools/dead_simple_owl_design_patterns/issues/71

hkir-dev commented 3 years ago

First iteration of the migration to Dosdp templates completed https://github.com/obophenotype/brain_data_standards_ontologies/tree/dosdp_based_pipeline/src/patterns/dosdp-patterns

To keep changes minimal, kept the tsv structures as is. These can be refactored to build a unified template.

Migration of robot BDS individuals creation failed, since it seems that Dosdp is not supporting named individuals yet https://github.com/INCATools/dead_simple_owl_design_patterns/issues/64

hkir-dev commented 3 years ago

When provided a list to a logical axioms, Dosdp constructs intersectionOf them. Such as :

<rdfs:subClassOf>
            <owl:Class>
                <owl:intersectionOf rdf:parseType="Collection">
                    <owl:Restriction>
                        <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002292"/>
                        <owl:someValuesFrom rdf:resource="http://identifiers.org/ensembl/ENSMUSG00000022206"/>
                    </owl:Restriction>
                    <owl:Restriction>
                        <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002292"/>
                        <owl:someValuesFrom rdf:resource="http://identifiers.org/ensembl/ENSMUSG00000030905"/>
                    </owl:Restriction>
                </owl:intersectionOf>
            </owl:Class>
</rdfs:subClassOf>

Robot was generating direct multiple subclassOf relations for those cases (without intersectionOf). While both are logically equivalent, this caused a problem in the neo4j2owl, seems it is not supporting intersectionOf/unionOf constructs and needs a significant refactoring to do so.

Ideally this needs to be solved by robot Expression Materializing Reasoner in the vfb_pipeline_dumps step. But with our 15Mb ontology, we get out of memory error.

As a workaround, handled subclassOf definitions that are intersection of a set of classes through sparql in the dumps phase.

Same should be applied for equivalent classes that are intersection of a set expressions (classes, existential restrictions etc.). They should be unpacked to a set of subclassOf (not equivalentClass) definitions. But some logical expressiveness will be lost.

hkir-dev commented 3 years ago

Now we have 6 dosdp templates (https://github.com/obophenotype/brain_data_standards_ontologies/tree/dosdp_based_pipeline/src/patterns/dosdp-patterns):

  1. brainCellRegionMinimalMarkers.yaml
  2. taxonomy_class.yaml
  3. taxonomy_equivalent_class.yaml
  4. taxonomy_minimal_markers.yaml
  5. taxonomy_non_taxonomy_classification.yaml
  6. ensmusg.yaml

1, 2, 3 and 4 can be unified to have a single big class template and we will have a table with 27 columns. Should we merge all or go with a subset (such as merge only 1 and 2) ?

dosumis commented 3 years ago

ensmug.yaml will remain a separate (ROBOT) build. It is used to support imports.

We should be able to manage with (many?) fewer columns than 27 for the rest. e.g. minimal markers var needed for generation of def and synonyms is the same as needed for logical axioms. Need a comprehensive review in context of pipelines scripts, configs and templates.

hkir-dev commented 3 years ago

This branch contains the unification updates: https://github.com/obophenotype/brain_data_standards_ontologies/tree/single_dosdp_template

  1. ensmusg.yaml is ROBOT template
  2. taxonomy_class.yaml + brainCellRegionMinimalMarkers.yaml + taxonomy_minimal_markers.yaml merged to single template -> taxonomy_class.yaml
  3. Same template will be used for different species. Such as taxonomy_class.yaml will be used with several data: CCN202002013_class.tsv, CCN201912131_class.tsv and CCN201912132_class.tsv. To maintain this: a- Automatic dosdp pattern rolling disabled (it requires same template and data file name), bdscratch.Makefile manages this. b- Pattern term extraction re-implemented in the bdscratch.Makefile (it was requiring same template and data file name previously). Current solution is not elegant, need to solve with subst.