Closed swuyts closed 9 years ago
Hi,
The plot_bar() function calls psmelt() which then calls melt() from the reshape2 package. The melt function simply takes your data from wide format (where rows are OTUs and columns are samples) to long format (where OTU names, samples, and OTU abundance are all columns). This is the standard format for utilizing the graphing abilities of ggplot2. It is during the melt step that all of your samples with a leading 0 are converted to a version without a leading 0. Since these samples no longer match any samples in your sample data they are not present in the plot.
That is the diagnosis for the unexpected behavior, but unfortunately it doesn't offer a solution. However, it seems like you already found a good solution for changing your sample names so I would stick with that. And beware that many popular r functions may have unexpected behavior if your column names start with 0 or have special characters etc. Hope that helps.
Best, Michelle
Hey Michelle,
Thanks for this! I've been using reshape2 for ggplot2 and did not know that this was a common 'problem'. This is very useful information for labeling samples in the future.
Thank you.
Cheers, Sander
Thanks, @michberr great answer.
It is a blessing and curse that ggplot2 will attempt to understand what form (e.g. continuous, categorical) of axis you want from the data type. It can be a problem for various date-endcodings as well, for example.
To be honest, I'm a little surprised that it converted your character to a continuous scale, but this would be expected behavior if the sample IDs had been R integers from the outset.
Either way, you can avoid this by including ID values that begin with a letter, as @michberr pointed out.
Issue closed! great job :)
joey
Hi @joey711, all, We (the NIAID Nephele microbiome analysis portal team) have encountered the same issue when users have integers as identifiers. Is there any change to the advise above to add a character to the beginning of the identifiers?
We not only have this problem with ggplot2 but also ade4 and vegan: please do not give sample names starting with numbers of at all possible, some programs actually automoatically add an X in front, mostly they don't so this advice still stands and is unfortunately difficult to fix.
Susan
On Fri, May 26, 2017 at 11:09 AM, R. Burke Squires <notifications@github.com
wrote:
Hi @joey711 https://github.com/joey711, all, We (the NIAID Nephele microbiome analysis portal team) have encountered the same issue when users have integers as identifiers. Is there any change to the advise above to add a character to the beginning of the identifiers?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/488#issuecomment-304351740, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvSjMdhRd1oh_roqAkg-EGUDYi8p3ks5r9xVJgaJpZM4FCpje .
-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/
Thank you Susan. Just making sure I was not missing something obvious...or nearly obvious! :-)
Just noting that the issue affects floating point identifiers as well. The only solutions seems to be the addition of some text to the beginning or end of each identifier.
Yes, that is consistent with the requirements of ade4 and vegan in particular for the naming of rows. Susan
On Tue, Jun 13, 2017 at 11:37 AM, R. Burke Squires <notifications@github.com
wrote:
Just noting that the issue affects floating point identifiers as well. The only solutions seems to be the addition of some text to the beginning or end of each identifier.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/488#issuecomment-308209128, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvQ0-Nchykp_Ea4TuGS7pLa0U98Akks5sDtbkgaJpZM4FCpje .
-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/
Hello,
I've been playing around with Phyloseq today and am very pleased with the package for now! I'm using the Restroom Biogeography page as a way to explore Phyloseq, using my own data.
Unfortunately I've ran into some trouble trying to plot "Figure 1 Part A (remake), attempt 2" which uses the plot_bar command. It took me a long time to realize that it had something to do with the names that I gave to my categories after merging the data (instead of grouping the by "SURFACE", I am grouping them by "Day"). I've named them ("01","02","03",...,"13","17","21","56').
When they are named like this I get the following plot:
As you can see all the samples starting with a zero are not shown in the plot, while they are still there in the dataframe. I've figured out that the zero in the name had something to do with this so I added the letter D in front of every category name resulting in:
Which is the expected result.
Have you heard about this issue before?
Cheers!