zellerlab / siamcat

R package for Statistical Inference of Associations between Microbial Communities And host phenoType
https://siamcat.embl.de/
51 stars 16 forks source link

Help filtering df.weights during meta-analysis routine #35

Closed drelo closed 1 year ago

drelo commented 2 years ago

Dear all,

I think I got stuck at the last step of this strategy when trying to analyze a number of feature (pathways) from 6 datasets using the tutorial here

The initial table after df.weights %>% full_join(abs.weights) has 77,879 rows.

Here is sample of this initial dataframe , it contains a lot of zeros in the 'median.rel.weight' with the exception of those 22 values maintained in this table.

df.weights %>%
  full_join(abs.weights) %>%  
  # normalize by the absolute model size
  mutate(median.rel.weight=median.rel.weight/sum.median) %>% 
  # only include genera of interest
  filter(feature %in% feat.of.interest$feature)

This would be the initial filtering of the feat.of.interest also present in the df.weights matrix. This returns 4,409 rows.

df.weights %>%
  full_join(abs.weights) %>%   
  # normalize by the absolute model size
  mutate(median.rel.weight=median.rel.weight/sum.median) %>% 
  # only include genera of interest
  filter(feature %in% feat.of.interest$feature)  %>% 
  # highlight feature rank for the top 20 features
  mutate(r.med=case_when(r.med > 10~NA_real_, TRUE~r.med))

This also returns 4,409 rows so the mutate(r.med=case_when(r.med > 1~NA_real_, TRUE~r.med)) part seems to have no effect.

Here is the table after the filter above just before the mutate step, saved as a tsv.

Could you help explaining me this? At the moment the code seems to plot a lot of rows way more than 10-20 and I can clearly see 7 features highlighted with numbers in the heatmap but there doesn't seem to be a filter in action (this also happened with taxa, with metaphlan output). Is this due to the abundance of '0' in the df.weights$median.rel.weight column?

Thanks very much for the help

Andrés