How to pass large output like HUMAnN features coming from several studies?

Hi again,

I wonder if any of you have already used some strategy to accommodate the huge number of features in an output like the gene families coming from a HUMAnN analysis + pooling this data from +4 studies. Since I wanted to isolate the batch effect, I already have the output from the MMUPHin strategy. I wonder if I could pick some number gene families with some criteria or if I will this introduce huge bias in the analysis. I thought one could perform this with two procedures:

(I) In MMUPhin, after obtaining the meta_fits <- fit_lm_meta$meta_fits I could use the features filtered in this step like this: meta_fits %>% filter(qval.fdr < 0.05) %>% arrange(coef) and pick some of the most relevant features.

(II) Another strategy would be just pick the most abundant features present in the whole pooled/merged dataset, but this wouldn't take into account the absence of features for certain (batch) studies.

With either of this strategies I think I would have to (re)normalize it after removing rows/features right? What is the best way to do this within MMUPhin or back again in humman3?

Thanks for your help.

Andrés

zellerlab / siamcat

How to pass large output like HUMAnN features coming from several studies? #34