shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
54 stars 12 forks source link

two concerns with the convert_phyloseq_euk_viral_glom.r standardized count table (species) #26

Closed RachelRodgers closed 1 year ago

RachelRodgers commented 3 years ago

1.) Check that coercion from factor to numeric is behaving as intended, for example:

# Adjust values
ps.melt.value.sp <- ps.melt.sp %>%
  select(c((ncol(MAP)+11):(ncol(ps.melt.sp)))) %>%
   mutate_if(is.factor, ~ as.numeric(levels(.x))[.x])

Coercion from factor to numeric requires you coerce to a character first.

2.) Merging the Baltimore classification table with the melted phyloseq data using a left_join() will likely drop taxa from your table (ie: families in your melted table that are not present in your Baltimore file), as the keep argument defaults to FALSE. Is this the intended behavior? If not, I would recommend changing this parameter to TRUE, then filtering any rows added to the table that have an empty OTU (the Baltimore file will pull in some families that may not match to any families in your melted table).

# may lose information here:
ps.melt.fixed.sp <- left_join(ps.melt.fixed.sp, baltimore, by = "Family")

3.) When generating the standardized count object at the genus level, the default option for tax_glom is to drop any taxa for which you are missing information at the specified rank. In this case it should never be an issue (NAs aren't expected) but it would be safer to go ahead and set this parameter to FALSE in the event something upstream has gone wrong:

ps0.ge.glom <- ps0.sp %>%
  speedyseq::tax_glom("Genus", NArm = FALSE)