haesleinhuepf / napari-accelerated-pixel-and-object-classification

GPU-accelerated, OpenCL-based Random Forest Classifiers for pixel and labeled object classification in napari.
BSD 3-Clause "New" or "Revised" License
43 stars 7 forks source link

Add corrections for multiple testing in correlation matrix #40

Open Cryaaa opened 1 year ago

Cryaaa commented 1 year ago

Hey @haesleinhuepf, I was just reading @marabuuu's awesome blogpost about feature extraction and saw that you suggest using the correlation matrix implemented here. Recently I was using correlation matrices in my own project when I was reminded that there might be quite a few false positive correlation hits just by chance in a correlation matrix because we are performing so many statistical tests in such a matrix. One could fine-tune the correlation matrix by correcting for falsely significant correlations using statistical methods such as FDR correction using Bootstrapping or others implemented in statistical analysis libraries. Just thought I'd suggest it here so I don't forget!

haesleinhuepf commented 1 year ago

Hey @Cryaaa ,

thanks for the input!

we are performing so many statistical tests in such a matrix

Technically, the Pearson correlation coefficient does not involve any statistical test. It's a method of descriptive statistics.

FDR correction using Bootstrapping

Can you provide a link to this method? Is there by chance a python implementation?

Thanks again!

Best, Robert

Cryaaa commented 1 year ago

@haesleinhuepf, Ahhhhh I just checked my code and I was using the Spearman rank correlation which might be different. I guess I was trying to replicate what other libraries in R do, where usually false detection rate corrections are implemented in the functions (usually designed for gene expression data so maybe it's a bit more crucial there). Anyway: here is a function which has a few multiple test corrections (all you need are the p_values unraveled).

I think it might also be alright to leave the function as is for a simple first test, but I could think that a correlation matrix with unsignificant results filtered out could be another option if the feature matrix get's really big. In this case FDR correction copuld make sense. I could try and code a function and test it since I have the code (almost) ready somewhere in a notebook!