joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
579 stars 188 forks source link

merge_samples #493

Closed insectnate closed 8 years ago

insectnate commented 9 years ago

I am trying to use merge_samples to combine some samples based on a categorical variable in the mapping file. It works fine except that the variables are converted to numbers in the resulting merged phyloseq object. Such that when I make a tree figure for example and try to colour the tips with the merged variable it produces one with a heatmap. test_tree_2

I have used this function before without problems and I'm not sure how to fix it. phyloseq version 1.10 release R 3.1.2

audy commented 9 years ago

@insectnate you might try converting these back to a factor or character vector.

For example,


# create a numeric column in GlobalPatterns' sample data for this example
sample_data(GlobalPatterns)$number <- as.numeric(sample_data(GlobalPatterns)$X.SampleID)

# plot tree, numeric column results in continuous color scale
plot_tree(GlobalPatterns, color='number')

# FIX: convert "number" column to a character vector
sample_data(GlobalPatterns)$number <- as.character(sample_data(GlobalPatterns)$number)

# plot tree, numeric column results in discrete color scale
plot_tree(GlobalPatterns, color='number')
insectnate commented 9 years ago

Could this be a result of using import_biom, import_qiime_sample_data and then merge_phyloseq? It just seems like merge_samples no longer does what it was intended to do.

insectnate commented 9 years ago

Thanks for the suggestion Andy. Your fix allows me to have discrete variables but they are still the numbers that replaced the sample_data following merge_samples and not the variable names. Is there a way to simply add an additional sample variables after the merge?

audy commented 9 years ago

@insectinate you're saying that non-numeric strings were converted into numbers? Maybe R turned them into a factor and, for some reason, they were converted into numerics.

Maybe try converting them before doing merge_samples.

This issue is difficult to diagnose without a reproducible example. Is it possible to provide a subset of the data that reproduces this problem?

insectnate commented 9 years ago

Hi Audy, What's the best way to share the data? Thanks for any help!

audy commented 9 years ago

You can paste the raw tables to gist.github.com (but please also paste code needed to load it) or use save() on the phyloseq object.

On Mon, Jul 6, 2015 at 2:03 PM, Nathan Jones notifications@github.com wrote:

Hi Audy, What's the best way to share the data? Thanks for any help!

— Reply to this email directly or view it on GitHub https://github.com/joey711/phyloseq/issues/493#issuecomment-118940405.

--austin

insectnate commented 9 years ago

Host_biom <- import_biom(“otu_table_host_only_norm.biom”, parseFunction = parse_taxonomy_greengenes) tree2<- read_tree(“97_otus_unannotated.tree”) map <- import_qiime_sample_data(“Fly_sequence_map_host_only.txt”) merged <- merge_phyloseq(Host_biom, tree2, map) merged_sample_glom <- merge_samples(merged, “Fly_stage”) plot_tree(merged_sample_glom, color = “Fly_stage” , label.tips = “Genus” ,sizebase= 2, base.spacing = 0.05)

joey711 commented 9 years ago

Has this been solved for your problem? The categorical variable is being treated as a factor. merge_samples has this issue that "off target" sample variables get coerced to their factor integers to facilitate the merge. In some cases, depending on what you have selected as the merging variable, the "off target" variables no longer make any sense -- or said another way, the entries for this variable don't have a good means to merge. Character strings can always be pasted together, which may make for a good default behavior in the future.

In this case it looks like "fly stage" was not unique within each merging category.

insectnate commented 9 years ago

I have not resolved this issue yet. The Fly_stage variable only had two states "Host_before and Host_after". The merge_samples behaves this way on all the categorical variables that I have tried it on. Is there a way to edit the factor in the merged object after merging? Thanks for any help. Apologies for the late reply.

Nathan

insectnate commented 9 years ago

I was coming back to try and resolve this.

joey711 commented 8 years ago

Nathan, no worries. Sorry for my slow response.

Part of the reason for my slow response is that this has been asked and answered before. I should move it to the phyloseq-FAQ, which will eventually be on the front page.

A good answer was:

https://github.com/joey711/phyloseq/issues/243

Also, see the solution demonstrated in the Restroom Biogeography tutorial:

http://joey711.github.io/phyloseq-demo/Restroom-Biogeography

Will close for now. Please post back if those two links do not help you solve your problem.

Cheers

joey