Closed joeflack4 closed 5 months ago
Context: this comment
@twhetzel I'm making this one high priority just because it is such a quick change to make.
And also one advantage of splitting them up is that we can drop --allow-missing-entities true
, and if the source changes and a mapping is no longer there, we'll know about it because it will error.
There's also the possible problem that if we merge them into 1 file and leave the other one blank, then when ROBOT fixes this problem, it might get annoyed by the blank file? Maybe not.
We've decided this issue is a bit of a distraction. We can always re-open it if needed, especially if ROBOT fixes the issue where 2 property maps / more than one are arbitrarily required.
there cant be a single one due to some problems with ROBOT rename
@matentzn BTW, I couldn't find an issue for this in the robot GitHub.If there's not one there, should I make one?
If you like, the issue would go something like this:
Add
--allow-duplicate-entities
as option in ROBOT rename.
In many pipelines, especially ETL, we want to unify a few different properties like their:label
, skos:prefLabel
, wikidata:name
to a single one, eg rdfs:label
. This is currently not possible as the rename command fails hard multiple properties are mapped to a single one. The reverse is obviously bogus (renaming 1 property to many), but it would be nice if we could bypass this precaution with an optional flag, --allow-duplicate-entities
, similar as to how we treat missing entities --allow-missing-entities
.
I may be following you, but let me check.
Cases:
Are you saying that --allow-duplicate-entities
is something that would enable (1) or (2)? It seems like it'd be for (2). But our problem with mondo-ingest
involves (1). So wouldn't I need to create 2 GitHub issues, one for each of these cases?
Example files from mondo-ingest
It looks like for mondo-ingest
, we only have case (1). We're trying to map multiple fields to oboInOwl:source
and oboInOwl:hasExactSynonym
. And that's why we split into two files.
`property-map-1.sssom.tsv`: ```tsv subject_id object_id http://purl.org/dc/elements/1.1/source oboInOwl:source http://www.w3.org/2004/02/skos/core#prefLabel http://www.w3.org/2000/01/rdf-schema#label http://www.w3.org/2004/02/skos/core#altLabel oboInOwl:hasExactSynonym ``` `property-map-2.sssom.tsv`: ```tsv subject_id object_id http://www.ebi.ac.uk/efo/definition http://purl.obolibrary.org/obo/IAO_0000115 http://www.ebi.ac.uk/efo/reason_for_obsolescence rdfs:comment http://www.ebi.ac.uk/efo/alternative_term oboInOwl:hasExactSynonym http://purl.obolibrary.org/obo/ECO_0000218 oboInOwl:source ```
So for mondo-ingest
, if (1) were to be addressed, then the following could be changed to a single line:
rename --mappings config/property-map-1.sssom.tsv --allow-missing-entities true \
rename --mappings config/property-map-2.sssom.tsv --allow-missing-entities true \
--->
rename --mappings config/property-map.sssom.tsv --allow-missing-entities true \
Are you saying that --allow-duplicate-entities is something that would enable (1) or (2)? It seems like it'd be for (2). But our problem with mondo-ingest involves (1). So wouldn't I need to create 2 GitHub issues, one for each of these cases?
No it would only enable (1). Many different labels properties unify to a single rdfs:label property. The reverse makes no sense, so no need for a specific issue on this (Imagine you have rdfs:label and rename it to 3 different label properties - that would be duplicating information).
Yeah that makes sense, I was just confused by what you'd written.
OK, so I can make an issue to enable (1) Many -> 1 situations, via --allow-duplicate-entities
.
Edit: Created!:
I wonder why should Many -> 1 even be an issue? Why should we have to add such a flag? I guess I figured out why.
Like for example, in mondo-ingest
, we need to convert http://purl.org/dc/elements/1.1/source
and http://purl.obolibrary.org/obo/ECO_0000218
to oboInOwl:source
. I wonder: Why would robot care about that?
I suppose it is because it expects that the source being modified/queried by the robot command would be a single, coherent ontology. If we're asking it to do a many -> 1 conversion, then that would indicate that there were 2 different properties in the ontology that were duplicative.
But for us, we're re-using these property map files across multiple use cases; something that robot isn't really considering by design.
Imagine you have an ontology that imports another ontology. The first one uses dcterms:contributor the other one oboInOwl:creator. Now before you create a merged release you want to unify both to dcelements:contributor. That's a use case for ROBoT! Thanks for making the issue!
@matentzn hanks. That's not what I was pondering. I wondered why robot doesn't allow these duplicates by default but I think I pondered the reason.
Overview
There are multiple
config/*property-map*.sssom.tsv
files. De-proliferate these.Sub-task list
mondo-ingest.Makefile
Sub-task details
2. Implement solution
a. Single file (if allowable, though there may be some weird ROBOT issue prohibiting it) b. Merge all mappings into 1 file, and have a second, empty file (to get around ROBOT issue) c. Keep
property-map-1
andproperty-map-2
as-is, and then add other ones, e.g.icd11foundation-property-map
file into one of these. One disadvantage of that is this.Additional details about solutions: https://github.com/monarch-initiative/mondo-ingest/pull/434#discussion_r1544921463