tetherless-world / hhear-ontology

Human Health Environmental Analysis Resource Ontology
Apache License 2.0
0 stars 0 forks source link

sdd2owl Generates the Same URI for Different Difference Classes #3

Closed chipmasters closed 4 years ago

chipmasters commented 4 years ago

@jimmccusker It seems that the sdd2owl process is generating the same URI for missing classes that are unrelated. Here is what was generated for the SDD-2016-1449-5-Outcome.xlsx file: In the Dictionary Mapping we have this row:

Column | Label | Attribute | attributeOf | Unit | Time Dx_alg | COI Diagnosis - algorithm, at 36 month visit | sio:SIO_010056 | ??child |   | ??visit1

and in the Codebook we have these mappings for Dx_alg (after running sdd2owl)

Column | Code | Label | Class Dx_alg | 0 |   | http://purl.org/twc/ctxid/cbbaaa6673fc79651ef21fe792de2a8f4a01b1d154e4608bd8dbc4423ec29d2ad6   | 1 |   | doid:0060041   | 2 |   | autism-core:AUTISMC1000015

Now in SDD-2016-1449-2-Covars.xlsx we have these rows in the Dictionary Mapping:

Column | Label | Attribute | attributeOf | Unit | Time HTN_EEQ | Chronic hypertension (EEQ) | sio:SIO_010056 | ??mother |   | ??pre_pregnancy HTN_PE | Preeclampsia (EEQ) | sio:SIO_010056 | ??mother |   | ??pregnancy DM1_EEQ | Diabetes type 1 | sio:SIO_010056 | ??mother |   | ??pregnancy DM2_EEQ | Diabetes type 2 | sio:SIO_010056 | ??mother |   | ??pregnancy GDM | Gestational diabetes | sio:SIO_010056 | ??mother |   | ??pregnancy

and these mappings in the Codebook (after running sdd2owl):

Column | Code | Label | Class Pvit_2to2 | 0 | No Prenatal Vitamin Use 2 months pior or post conception | http://purl.org/twc/ctxid/cbf51018f7a179d9fb8c212a26ce75be7e0ce98d4bc35c5f223479c20d07c001b   | 1 |   | hhear:00525 HTN_EEQ | 0 | No Chronic Hypertension | http://purl.org/twc/ctxid/cbbaaa6673fc79651ef21fe792de2a8f4a01b1d154e4608bd8dbc4423ec29d2ad6   | 1 |   | doid:10763 HTN_PE | 0 | No PreEclampsia | http://purl.org/twc/ctxid/cbbaaa6673fc79651ef21fe792de2a8f4a01b1d154e4608bd8dbc4423ec29d2ad6   | 1 |   | doid:10591 DM1_EEQ | 0 | No Type-1 Diabetes | http://purl.org/twc/ctxid/cbbaaa6673fc79651ef21fe792de2a8f4a01b1d154e4608bd8dbc4423ec29d2ad6   | 1 |   | doid:9744 DM2_EEQ | 0 | No Type-2 Diabetes | http://purl.org/twc/ctxid/cbbaaa6673fc79651ef21fe792de2a8f4a01b1d154e4608bd8dbc4423ec29d2ad6   | 1 |   | doid:9352 GDM | 0 | No Gestational Diabetes | http://purl.org/twc/ctxid/cbbaaa6673fc79651ef21fe792de2a8f4a01b1d154e4608bd8dbc4423ec29d2ad6   | 1 |   | doid:11714

Note that not only is the same URL (http://purl.org/twc/ctxid/cbbaaa6673fc79651ef21fe792de2a8f4a01b1d154e4608bd8dbc4423ec29d2ad6) generated for each of the distinct cases in the Covars file above, but it is also the one generated in the Outcome file.

It seems that this is always the URL generated when the Attribute in the Dictionary mapping is sio:SIO_010056 (Phenotype). However this is clearly wrong, since for the case where GDM=0, the URL is supposed to represent all the instances of sio:SIO_010056 that are not instances of doid:11714, right? So how can the same URL be used for the absence of Type-2 diabetes, Type-1 diabetes, etc.? There should be distinct URLs generated for each of these cases, right?

I can send you the relevant SDDs for testing via email if you agree this is a bug. If you don't think this is a bug, we still need to discuss how to address the resulting behavior in HADatAc, because now the presence of this URL in multiple mappings is causing problems in the facet search.

chipmasters commented 4 years ago

Nevermind. After all that, I see the issue is in the SDD Codebook first column. There are missing entries which caused the problem. I will close this tomorrow once I confirm it all works as expected after making those fixes.