Modelling of p-values and re-use from STATO (OBI)

cmaumet commented 9 years ago

We initially planned to re-use STATO terms for:

p-value
FWE-corrected p-value
FDR p-value

But looking more closely at STATO (OBI), those terms are defined as classes while we use them as properties so that we can not re-use STATO terms directly with our current model (cf. ISA-tools/stato#38 for more details).

So... I think we have two options:

Keep our current model and create nidm terms for p-values (i.e. do not re-use STATO terms), e.g. for a nidm:HeightThreshold:

 niiri:my_height_threshold a nidm:HeightThreshold ;
             nidm:pValueFWER "0.05"^^xsd:float ;
             nidm:pValueUncorrected "7.62e-07"^^xsd:float ;
             prov:value "5.23"^^xsd:float . # corresponding statistic value

Modify our model so that the STATO p-value classes can be used, e.g. for a nidm:HeightThreshold we could have:

niiri:my_height_threshold_1 a nidm:HeightThreshold ;
           nidm:hasPValue niiri:my_fwer_p_value ;
           nidm:hasPValue niiri:my_uncorrected_p_value^^xsd:float ;
           prov:value "5.23"^^xsd:float . # corresponding statistic value

niiri:my_fwer_p_value a obi:'FWER p-value' ;
           prov:value "0.05"^^xsd:float . 

niiri:my_uncorrected_p_value a obi:'uncorrected p-value' ;
           prov:value "7.62e-07"^^xsd:float .

Pros and cons

Option 1 provides a more condensed representation and the corresponding queries are hence shorter (cf. example below).

But:

With option 1, we are unable to re-use the STATO terms which align very closely with our needs. This also means that we will have to come up with our own definitions (of the same concepts...).
Queries

This is how a query searching for an FWER p-value height threshold would look like (written with semantic identifiers just for readability):

Option 1:

SELECT ?p_corr_fwe WHERE { 
?height a nidm:HeightThreshold: .
?height nidm:pValueFWER ?p_corr_fwe.
}

Option 2:

SELECT ?p_corr_fwe WHERE { 
?height a nidm:HeightThreshold: .
?height nidm:hasPValue ?pfwer_entity .
?pfwer_entity a obi:'FWER p-value' .
?pfwer_entity prov:value ?p_corr_fwe .
}

Discussion

This is quite an important point as it affects our data model directly. Could you let me know what would be your preference between option 1 and option 2?

I tend to think that option 2 is the right way to go (as we agreed to re-use as many STATO terms as we can) but this means quite an update in the structure of the model.

tiborauer commented 9 years ago

I like option 2 and defining p values as classes in general, so that the correction method (for FWE and FDR) can be also modelled as attributes.

nicholst commented 9 years ago

I see the conceptual elegance of option 2, but am wary about the amount of work needed to implement & query complexity. I hope @gllmflndn will weigh in on this, as he'll have to implement it for SPM NIDM export; also, I wonder if @satra and/or @chrisfilo has any insight on the speed of SPARQL queries, and whether Option 2 represents a negligible or appreciable increase in query complexity over Option 1.

About re-use of STATO terms: With option 1, no, we can't directly re-use them, but when the concept is the same, we can directly reference the STATO term in our definition.

khelm commented 9 years ago

I also like option 2 and think that referencing the STATO term in our definition is basically defeating the purpose of creating a model like this since that information is then an "orphan" - you don't get any benefit from it semantically.

gllmflndn commented 9 years ago

It makes export implementation and SPARQL queries slightly more complicated, but if option 2 is the way to go, so be it...

jbpoline commented 9 years ago

It does sound like a rather big change - I suppose there is no way to turn the class into a property - and if that's indeed the case I would agree that we have to go all the way and do it. Nolan, Satra, Dave, Jessica ... any thought / comment ? I know that queries will not be written by most researchers - but it does look a little heavy, so a double brain storming would be good !

incf-nidash / nidm-specs

Modelling of p-values and re-use from STATO (OBI) #304

Pros and cons

Queries

Discussion