Open mikemc opened 6 years ago
This appears to be happening in psmelt
, which plot_bar
calls. Specifically this line: mdf = reshape2::melt(as(otutab, "matrix"))
. The melt
will remove left side padded zeroes, so sample 01
becomes 1
. Later, when this is called: mdf <- merge(mdf, sdf, by.x = "Sample")
, it can't properly merge the samples because the names are different. And it is dropped.
I think the best solution is to just not use numerics as sample names. Depending on how you import your data, keeping check.names = T
will result in an X
being prepended to your sample names, which would also fix the problem by forcing them into characters.
Thanks for your observation and suggestion, Jeff. I'm using data from the Microbiome Quality Control project, which used randomized numeric identifiers of length 10, sometimes with a leading zero. But pre-pending an "X" to the names is a good enough solution for me. Still, it would be good at least for a warning to be printed when samples are stripped, which could be easily checked in the plot_bar function.
hey guys. so, i am quite new in programming and encountered the same problem when i named my sample as listed below. so my question is where should i put this check.names function in my code?
sample_name <- c("07", "08", "09", "10")
sample <- data.frame(sample_name, row.names=sample_name)
I think honestly the best way (other than not using numeric sample names to begin with) is to just edit the phyloseq object sample names. Here's how using the GlobalPatterns
dataset.
> library("phyloseq")
> data("GlobalPatterns")
> dput(sample_names(GlobalPatterns))
c("CL3", "CC1", "SV1", "M31Fcsw", "M11Fcsw", "M31Plmr", "M11Plmr",
"F21Plmr", "M31Tong", "M11Tong", "LMEpi24M", "SLEpi20M", "AQC1cm",
"AQC4cm", "AQC7cm", "NP2", "NP3", "NP5", "TRRsed1", "TRRsed2",
"TRRsed3", "TS28", "TS29", "Even1", "Even2", "Even3")
dput
will print out the sample names vector in the same format for creating a vector. Just copy that output and edit the names however you see fit, and save that back into a vector named samples
. You can see below I changed the name of a couple of them.
> samples = c("CL3-edit", "CC1", "SV1", "M31Fcsw", "M11Fcsw", "M31Plmr", "M11Plmr",
"F21Plmr", "M31Tong", "M11Tong-edit", "LMEpi24M", "SLEpi20M", "AQC1cm",
"AQC4cm", "AQC7cm", "NP2", "NP3", "NP5", "TRRsed1", "TRRsed2",
"TRRsed3", "TS28", "TS29", "Even1", "Even2", "Even3")
# copy the GlobalPatterns object and update the names
> GlobalPatterns2 = GlobalPatterns
> sample_names(GlobalPatterns2) = samples
# check that the names changed
> sample_names(GlobalPatterns2)
[1] "CL3-edit" "CC1" "SV1" "M31Fcsw" "M11Fcsw" "M31Plmr" "M11Plmr" "F21Plmr"
[9] "M31Tong" "M11Tong-edit" "LMEpi24M" "SLEpi20M" "AQC1cm" "AQC4cm" "AQC7cm" "NP2"
[17] "NP3" "NP5" "TRRsed1" "TRRsed2" "TRRsed3" "TS28" "TS29" "Even1"
[25] "Even2" "Even3"
These sample name changes should be preserved everywhere, including the otu_table
. Just be sure you don't change the order of your samples in the vector.
I should add that if you just want to add "X" or "Sample" to the beginning of all samples, try this before adding to the new phyloseq object:
samples = paste0("X", samples)
samples = paste0("Sample", samples)
Or, even easier
GlobalPatterns2 = GlobalPatterns
sample_names(GlobalPatterns2) = paste0("X", sample_names(GlobalPatterns2))
> sample_names(GlobalPatterns2)
[1] "XCL3" "XCC1" "XSV1" "XM31Fcsw" "XM11Fcsw" "XM31Plmr" "XM11Plmr"
[8] "XF21Plmr" "XM31Tong" "XM11Tong" "XLMEpi24M" "XSLEpi20M" "XAQC1cm" "XAQC4cm"
[15] "XAQC7cm" "XNP2" "XNP3" "XNP5" "XTRRsed1" "XTRRsed2" "XTRRsed3"
[22] "XTS28" "XTS29" "XEven1" "XEven2" "XEven3"
thanks for everything
I had the same problem, and the point is that when the function merge_samples() is used to create the plot bars, it is quite frequent to merge the samples following a factor, and factors are also frequently labelled with numbers. Therefore, it is not rare that some of the factors will not appear (in my case those labelled as "1.0", "2.0", "3.0", etc.) Of course one can rename the factors, but I think this is something to fix (or at least to warn to the user) because when the factor has many levels one may not realize.
If sample names are strings of numbers and the name of a sample starts with a zero, then that sample will be dropped without any warning or error when calling
plot_bar()
. Here is a minimal working example: