Open cmaumet opened 9 years ago
I like option 2 and defining p values as classes in general, so that the correction method (for FWE and FDR) can be also modelled as attributes.
I see the conceptual elegance of option 2, but am wary about the amount of work needed to implement & query complexity. I hope @gllmflndn will weigh in on this, as he'll have to implement it for SPM NIDM export; also, I wonder if @satra and/or @chrisfilo has any insight on the speed of SPARQL queries, and whether Option 2 represents a negligible or appreciable increase in query complexity over Option 1.
About re-use of STATO terms: With option 1, no, we can't directly re-use them, but when the concept is the same, we can directly reference the STATO term in our definition.
I also like option 2 and think that referencing the STATO term in our definition is basically defeating the purpose of creating a model like this since that information is then an "orphan" - you don't get any benefit from it semantically.
It makes export implementation and SPARQL queries slightly more complicated, but if option 2 is the way to go, so be it...
It does sound like a rather big change - I suppose there is no way to turn the class into a property - and if that's indeed the case I would agree that we have to go all the way and do it. Nolan, Satra, Dave, Jessica ... any thought / comment ? I know that queries will not be written by most researchers - but it does look a little heavy, so a double brain storming would be good !
We initially planned to re-use STATO terms for:
But looking more closely at STATO (OBI), those terms are defined as classes while we use them as properties so that we can not re-use STATO terms directly with our current model (cf. ISA-tools/stato#38 for more details).
So... I think we have two options:
Keep our current model and create nidm terms for p-values (i.e. do not re-use STATO terms), e.g. for a nidm:HeightThreshold:
Modify our model so that the STATO p-value classes can be used, e.g. for a nidm:HeightThreshold we could have:
Pros and cons
But:
Queries
This is how a query searching for an FWER p-value height threshold would look like (written with semantic identifiers just for readability):
Discussion
This is quite an important point as it affects our data model directly. Could you let me know what would be your preference between option 1 and option 2?
I tend to think that option 2 is the right way to go (as we agreed to re-use as many STATO terms as we can) but this means quite an update in the structure of the model.