Closed nw-duncan closed 2 years ago
how closely related are data dependency and data distribution issues?- I feel like they're pretty different issues But if they're not, i've been told before that when the samples are large (we have 700 data points in that voxel), there's no need to use non-parametric stats even if the data is skewed, so in theory there'd be no advantage of using non-para If data dependency issues also go away when there's a large amount of data, winner, we can just use t-tests, but i doubt it's that simple
I've come across this paper https://psyarxiv.com/f2tyw/ based on the original https://www.nature.com/articles/6889010 that might be closer to what we need- or it might be a bit too close to multiple comparisons corrections rather than dependency issues corrections But if dependent data shrinks the standard error and increases false positive rate and this solves that, surely it's similar?
By non-parametric I was thinking of some sort of Monte-Carlo or permutation testing. In other words, would we expect the same difference if we randomly selected voxels from across the brain.
That preprint is interesting. I'm not statistically literate enough to judge if it helps with the non-independence issue (although I share your instinct that it might) but it could be useful if we decide to compare each and every gene.
Let's go ahead with getting the basics put in place. Once the general pipeline is in place it should be easy enough to modify it to implement another testing method.
It would be good if whenever people input more than one region a statistical comparison of gene expression is carried out.
Is it valid to use the values from each voxel within a region as datapoints in a ttest or similar? Things would be a bit weird in terms of independence but we could probably gloss over that.
Non-parametric tests?