incf-nidash / nidm-specs

Neuroimaging Data Model (NIDM): describing neuroimaging data and provenance
nidm.nidash.org
Other
33 stars 30 forks source link

Modelling of p-values and re-use from STATO (OBI) #304

Open cmaumet opened 9 years ago

cmaumet commented 9 years ago

We initially planned to re-use STATO terms for:

But looking more closely at STATO (OBI), those terms are defined as classes while we use them as properties so that we can not re-use STATO terms directly with our current model (cf. ISA-tools/stato#38 for more details).

So... I think we have two options:

  1. Keep our current model and create nidm terms for p-values (i.e. do not re-use STATO terms), e.g. for a nidm:HeightThreshold:

     niiri:my_height_threshold a nidm:HeightThreshold ;
                 nidm:pValueFWER "0.05"^^xsd:float ;
                 nidm:pValueUncorrected "7.62e-07"^^xsd:float ;
                 prov:value "5.23"^^xsd:float . # corresponding statistic value
    
  2. Modify our model so that the STATO p-value classes can be used, e.g. for a nidm:HeightThreshold we could have:

    niiri:my_height_threshold_1 a nidm:HeightThreshold ;
               nidm:hasPValue niiri:my_fwer_p_value ;
               nidm:hasPValue niiri:my_uncorrected_p_value^^xsd:float ;
               prov:value "5.23"^^xsd:float . # corresponding statistic value
    
    niiri:my_fwer_p_value a obi:'FWER p-value' ;
               prov:value "0.05"^^xsd:float . 
    
    niiri:my_uncorrected_p_value a obi:'uncorrected p-value' ;
               prov:value "7.62e-07"^^xsd:float . 
    
    Pros and cons
    • Option 1 provides a more condensed representation and the corresponding queries are hence shorter (cf. example below).

But:

This is how a query searching for an FWER p-value height threshold would look like (written with semantic identifiers just for readability):

This is quite an important point as it affects our data model directly. Could you let me know what would be your preference between option 1 and option 2?

I tend to think that option 2 is the right way to go (as we agreed to re-use as many STATO terms as we can) but this means quite an update in the structure of the model.

tiborauer commented 9 years ago

I like option 2 and defining p values as classes in general, so that the correction method (for FWE and FDR) can be also modelled as attributes.

nicholst commented 9 years ago

I see the conceptual elegance of option 2, but am wary about the amount of work needed to implement & query complexity. I hope @gllmflndn will weigh in on this, as he'll have to implement it for SPM NIDM export; also, I wonder if @satra and/or @chrisfilo has any insight on the speed of SPARQL queries, and whether Option 2 represents a negligible or appreciable increase in query complexity over Option 1.

About re-use of STATO terms: With option 1, no, we can't directly re-use them, but when the concept is the same, we can directly reference the STATO term in our definition.

khelm commented 9 years ago

I also like option 2 and think that referencing the STATO term in our definition is basically defeating the purpose of creating a model like this since that information is then an "orphan" - you don't get any benefit from it semantically.

gllmflndn commented 9 years ago

It makes export implementation and SPARQL queries slightly more complicated, but if option 2 is the way to go, so be it...

jbpoline commented 9 years ago

It does sound like a rather big change - I suppose there is no way to turn the class into a property - and if that's indeed the case I would agree that we have to go all the way and do it. Nolan, Satra, Dave, Jessica ... any thought / comment ? I know that queries will not be written by most researchers - but it does look a little heavy, so a double brain storming would be good !