stefpeschel / NetCoMi

Network construction, analysis, and comparison for microbial compositional data
GNU General Public License v3.0
146 stars 26 forks source link

Automatically selecting intersection of taxa- how to override? #35

Closed mirpie closed 2 years ago

mirpie commented 2 years ago

Hi there! First off- thanks for making such an awesome package. This is an amazing tool. I have an experimental design where I'm attempting to compare the structure of microbial networks between two different genotypes of mice. I'm constructing each network separately using a split physeq object. My code is as follows:

net_geno <- netConstruct(data = physeq_split$B6, data2 = physeq_split$TCR, filtTax = "highestFreq", filtTaxPar = list(highestFreq = 100), measure = "spring", measurePar = list(nlambda=10, rep.num=10), normMethod = "none", zeroMethod = "none", sparsMethod = "none", dissFunc = "signed", verbose = 3, seed = 123456)

when I run it I get the message "Intersection of taxa selected" no matter what I do to modify the taxon filtering criteria. I know based on previous analyses that the microbiota of the knockout mice (TCR) contain some keystone taxa that are absent in the wild type mice (B6) which I would like to include in the analysis. How can I override this additional filtering?

Thanks!

stefpeschel commented 2 years ago

Hey, Thanks for using NetCoMi!

Network comparison is only possible if both networks contain exactly the same taxa because for each taxa-pair the associations as well as local network properties are compared between the groups. Therefore, the group differences must be computed which would not be possible if taxa are missing in one group.

To keep a taxon that exists in only one of the groups, you could add a vector with zero counts to the count matrix where the taxon is missing. Just ensure that the taxa label is equal in both groups.

Please note: If you use the taxon filter with "highestFreq = 100", the 100 most abundant taxa are selected in each group seperately and the intersect is built afterwards so that the final number of selected taxa will probably be below 100.

Best, Stefanie

mirpie commented 2 years ago

Ah okay I see, thanks so much for explaining! Would the argument "highestVar" in this case give me the most variable taxa between groups, or would it simply give those taxa with the highest variance within each and then find the intersect? If it's the former that would be moreso what I'm after. I'm pretty sure my count matrices already include all zeros for species that are "absent", given they are present in some small amount in some of the samples within the group and are not entirely null values.

Thanks again!

stefpeschel commented 2 years ago

It's the latter: it selects taxa with highest variation within each group and takes the intersect afterwards. But you can also filter your taxa in advance without using NetCoMi's filters. Instead of the phyloseq object, you can also just pass the count matrices to netConstruct, because at the moment, the OTU table is the only object used from a phyloseq object, anyway.

Best, Stefanie