mikemc / speedyseq

Speedy versions of phyloseq functions
https://mikemc.github.io/speedyseq/
Other
45 stars 6 forks source link

`merge_samples2()` fails when `group` values are numbers #52

Closed mikemc closed 3 years ago

mikemc commented 4 years ago

Reprex below. Occurs if a factor whose labels are numbers, and if just integers. Likely related to phyloseq bugs involving sample names that are numbers.

library(speedyseq)
#> Loading required package: phyloseq
#> 
#> Attaching package: 'speedyseq'
#> The following objects are masked from 'package:phyloseq':
#> 
#>     filter_taxa, plot_bar, plot_heatmap, plot_tree, psmelt, tax_glom,
#>     tip_glom, transform_sample_counts

data(enterotype)

ps <- enterotype 
sample_data(ps) %>% dplyr::glimpse()
#> Rows: 280
#> Columns: 9
#> $ Enterotype     <fct> NA, NA, NA, 3, 2, NA, 3, 3, NA, 2, 1, 2, 2, 3, 3, 3, 1…
#> $ Sample_ID      <fct> AM.AD.1, AM.AD.2, AM.F10.T1, AM.F10.T2, DA.AD.1, DA.AD…
#> $ SeqTech        <fct> Sanger, Sanger, Sanger, Sanger, Sanger, Sanger, Sanger…
#> $ SampleID       <fct> AM.AD.1, AM.AD.2, AM.F10.T1, AM.F10.T2, DA.AD.1, NA, D…
#> $ Project        <fct> gill06, gill06, turnbaugh09, turnbaugh09, MetaHIT, NA,…
#> $ Nationality    <fct> american, american, american, american, danish, NA, da…
#> $ Gender         <fct> F, M, F, F, F, NA, M, F, NA, M, F, M, F, F, M, M, M, M…
#> $ Age            <dbl> 28, 37, NA, NA, 59, NA, 54, 49, NA, 59, 25, 49, 47, 38…
#> $ ClinicalStatus <fct> healthy, healthy, obese, obese, healthy, NA, healthy, …
ps0 <- merge_samples2(ps, "Enterotype", funs = list(Age = mean))
#> Warning in merge_samples2(ps, "Enterotype", funs = list(Age = mean)): `group`
#> has missing values; corresponding samples will be dropped
#> Error in validObject(.Object): invalid class "phyloseq" object: 
#>  Component sample names do not match.
#>  Try sample_names()

Created on 2020-09-10 by the reprex package (v0.3.0)

mikemc commented 4 years ago

This is actually only a problem when the group values are the numbers 1:n for some n; this triggers phyloseq::sample_data() to change the new sample names to "sa1", etc. See Issue #53. Fix is to construct the new sample data object in a way skips this name adjustment.

mikemc commented 3 years ago

This seems to not be fixed in general. It works on the enterotype example above, but not on

sam <- tibble(sample_id = letters[1:3], group_var = 1:3) %>% sample_data
x <- merge_samples2(sam, "group_var")
sample_names(x); sample_variables(x)