Dichotomous outcome - Githubissues

andrewjmc commented 2 years ago

Hello,

If the series data being analysed is dichotomous and would therefore likely be better modelled by logistic regression, would it be easy to modify the code to apply logistic regression? I looked at the source, but I'm afraid it's above my grade!

The application is presence/absence of genes from species of bacteria by patient disease status. The series is the order of genes in the genome (I am looking for positional clustering along the genome). If there's an obvious better choice, any pointers would be welcome!

Best wishes,

Andrew

jaromilfrossard commented 2 years ago

Hello,

Thank you for your interest in permuco. I dont think they might be an easy way to use permuco for the logistic regression or GLM.

In the future, I may include the method rpopose by Potter (https://doi.org/10.1002/sim.1931) to run permutation tests in GLM models.

Sincerly,

Jaromil Frossard

PS: If I understand correctly your problem "y" is a matrix (row:patients, column: gene of bacteria) and the design is simply the patient status (1 factor with multiple levels). In addition, you want to test if the patient status implies the presence of the gene/bacteria, for each gene/bacteria and controls the FWER. If this setting corresponds to your problem and you only have 1 variable in your design, you do not need any transformations before the permuting the "y" and use a method similar to "manly" as your data are "exchangeable under the null".

In that case, you "only" need the distribution by permutation of the statistics: for instance a matrix of LRT in rows: the permutation, in column the genes (refered to distr_mat later). If the first row of this matrix contains the observed statistic, you can use the multiple comparisons procedures implemented permuco (https://jaromilfrossard.github.io/permuco/reference/index.html, section "multiple comparisons procedures") without calling clusterlm(). eg: compute_troendle(distribution = distr_mat, alternative = "two.sided").

It may be possible to create distr_mat with for loop over the permutation and columns of y and calling glm(). It will be slow, but should works.

andrewjmc commented 2 years ago

Hi Jaromil,

Many thanks for your helpful response. You are right that I am analysing a matrix of patients and genes, with patients in two groups. At the moment I can do this quite simply with Fisher exact, or with Firth logistic regression to account for other variables.

However, I am keen to detect spatial clustering of signals along the genomes, as this will likely increase power, and genetic changes within bacterial species are quite likely to include acquisition or loss of multiple genes from a locus. Thus the ability to analyse summary statistics per gene (t / p values) in a 1D sequence and detect clusters would be useful.

I think I may need to apply a method like this cluster permutation test as described for continuous data: https://benediktehinger.de/blog/science/statistics-cluster-permutation-test/

Best wishes,

Andrew

jaromilfrossard / permuco

Dichotomous outcome #11