joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
584 stars 187 forks source link

How can i make bar or box plots using distance matrices (i.e. UniFrac) in phyloseq? #714

Open wasimbt opened 7 years ago

kekokenepane commented 6 years ago

Hi, I'm new here, pretty new to R and Phyloseq. I think the question above describes my problem that I can't solve at the moment. I have a phyloseq object and a distance matrix and would like to create plots like these http://qiime.org/tutorials/creating_distance_comparison_plots.html in R - box plot of distances within and between groups. Is there a smooth way of doing it with phyloseq objects or some workaround with data frames? regards, Luka

spholmes commented 6 years ago

You may be helped by looking carefully both at the tutorials here:

http://statweb.stanford.edu/~susan/summer12/phyloseq-demosh.html http://joey711.github.io/phyloseq-demo/phyloseq-demo.html

and the Bioconductor workflow paper: https://f1000research.com/articles/5-1492/v2 http://web.stanford.edu/class/bios221/MicrobiomeWorkflowII.html

Without a more precise MWE we can't really help more.

On Tue, Dec 19, 2017 at 9:25 AM, kekokenepane notifications@github.com wrote:

Hi, I'm new here, pretty new to R and Phyloseq. I think the question above describes my problem that I can't solve at the moment. I have a phyloseq object and a distance matrix and would like to create plots like these http://qiime.org/tutorials/creating_distance_comparison_plots.html in R - box plot of distances within and between groups. Is there a smooth way of doing it with phyloseq objects or some workaround with data frames? regards, Luka

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/714#issuecomment-352828334, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvdvToJTxGHysqJUe9MxmcyCq7L-qks5tB_GWgaJpZM4L8UM3 .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

jeffkimbrel commented 6 years ago

Here is an old function I have that should get you where you need to go.

library("phyloseq")
library("reshape2")
library("dplyr")
library("ggplot2")

data(GlobalPatterns)

plotDistances = function(p = GlobalPatterns, m = "wunifrac", s = "X.SampleID", d = "SampleType") {

  # calc distances
  wu = phyloseq::distance(p, m)
  wu.m = melt(as.matrix(wu))

  # remove self-comparisons
  wu.m = wu.m %>%
    filter(as.character(Var1) != as.character(Var2)) %>%
    mutate_if(is.factor,as.character)

  # get sample data (S4 error OK and expected)
  sd = sample_data(p) %>%
    select(s, d) %>%
    mutate_if(is.factor,as.character)

  # combined distances with sample data
  colnames(sd) = c("Var1", "Type1")
  wu.sd = left_join(wu.m, sd, by = "Var1")

  colnames(sd) = c("Var2", "Type2")
  wu.sd = left_join(wu.sd, sd, by = "Var2")

  # plot
  ggplot(wu.sd, aes(x = Type2, y = value)) +
    theme_bw() +
    geom_point() +
    geom_boxplot(aes(color = ifelse(Type1 == Type2, "red", "black"))) +
    scale_color_identity() +
    facet_wrap(~ Type1, scales = "free_x") +
    theme(axis.text.x=element_text(angle = 90, hjust = 1, vjust = 0.5)) + 
    ggtitle(paste0("Distance Metric = ", m))
}

p = your phyloseq object m = distance metric s = phyloseq sample data column with your sample names d = sample data for groups

The function removes self comparisons, and colors points in the same group red. You'll get an S4 error, but that is OK.

plotDistances()

rplot10

mawij2002 commented 6 years ago

For this example, just copying the function and applying it (plotDistances()) I get this error:

Error in select(., s, d) : unused arguments (s, d)

Am I applying it wrong?

Thanks!

jeffkimbrel commented 6 years ago

You need to set the p, s and d parameters according to your dataset.

p is your phyloseq object. s is the column name that has your sample IDs. This depends on what you called it, but is likely something like 'SAMPLE' or 'Sample'. d is the column header that has the groups defined.

Just add those to the function call...

plotDistances(p = phyloObject, s = "SAMPLE_COLUMN", d = "GROUP_COLUMN")

mawij2002 commented 6 years ago

I tried to use it first with the example, and I keep getting the error:

data(GlobalPatterns)

> plotDistances = function(p = GlobalPatterns, m = "wunifrac", s = "X.SampleID", d =

"SampleType") {

+

+ # calc distances

+ wu = phyloseq::distance(p, m)

+ wu.m = melt(as.matrix(wu))

+

+ # remove self-comparisons

+ wu.m = wu.m %>%

+ filter(as.character(Var1) != as.character(Var2)) %>%

+ mutate_if(is.factor,as.character)

+

+ # get sample data (S4 error OK and expected)

+ sd = sample_data(p) %>%

+ select(s, d) %>%

+ mutate_if(is.factor,as.character)

+

+ # combined distances with sample data

+ colnames(sd) = c("Var1", "Type1")

+ wu.sd = left_join(wu.m, sd, by = "Var1")

+

+ colnames(sd) = c("Var2", "Type2")

+ wu.sd = left_join(wu.sd, sd, by = "Var2")

+

+ # plot

+ ggplot(wu.sd, aes(x = Type2, y = value)) +

+ theme_bw() +

+ geom_point() +

+ geom_boxplot(aes(color = ifelse(Type1 == Type2, "red", "black"))) +

+ scale_color_identity() +

+ facet_wrap(~ Type1, scales = "free_x") +

+ theme(axis.text.x=element_text(angle = 90, hjust = 1, vjust = 0.5)) +

+ ggtitle(paste0("Distance Metric = ", m))

+ }

plotDistances(p = GlobalPatterns, s = "X.SampleID", d = "SampleType") Show Traceback

Rerun with Debug Error in select(., s, d) : unused arguments (s, d)

Ultimately, I would want to use it with Bray (instead of wUnifrac).

Thanks for your help.

jeffkimbrel commented 6 years ago

Ah, it might just be a tidyverse/dplyr problem... do you have either of those loaded? Try library("tidyverse") or library("dplyr") first.

mawij2002 commented 6 years ago

I have tried with those libraries but I get the same error. (I will try bray once I can get the example to work). Thanks again for the help.

jeffkimbrel commented 6 years ago

Also, it looks like bray isn't an option for how I set this up. You will have to overwrite

wu = phyloseq::distance(p, m)

with

wu = vegan::vegdist(t(otu_table(p)), method = "bray")

to get Bray. I didn't write this function initially with full functionality in mind, so it needs to be hacked to do much beyond the basics.

jeffkimbrel commented 6 years ago

What happens if you try and run sample_data(p) but using your phyloseq object instead of p? Do you get column names that are the same as what you set s and d to?

mawij2002 commented 6 years ago

I do get them, with the same column names (just without ""). I don't understand why it says that. The error might be coming from this part:

sd = sample_data(p) %>%

  • select(s, d) %>%
  • mutate_if(is.factor,as.character) Error in select(., s, d) : unused arguments (s, d)

((I run that with the example data)) Thx!

jeffkimbrel commented 6 years ago

If you set s and d in your environment, does it work?

s = "X.SampleID"
d = "SampleType"

sd = sample_data(p) %>%
    select(s, d) %>%
    mutate_if(is.factor,as.character)
mawij2002 commented 6 years ago

It does not.

data(GlobalPatterns) s = "X.SampleID" d = "SampleType" p = GlobalPatterns

> sd = sample_data(p) %>%

+ select(s, d) %>%

+ mutate_if(is.factor,as.character)

Error in select(., s, d) : unused arguments (s, d)

Same error.

jeffkimbrel commented 6 years ago

Sorry, I just can't figure out why it isn't working for you. If I start up a fresh R session I get this (with the S4 error message removed)...

> library("phyloseq")
> packageVersion("phyloseq")
[1] ‘1.22.3’
> 
> library("tidyverse")
> packageVersion("tidyverse")
[1] ‘1.2.1’
> 
> data("GlobalPatterns")
> 
> p = GlobalPatterns
> s = "X.SampleID"
> d = "SampleType"
> 
> sd = sample_data(p) %>%
+   select(s, d) %>%
+   mutate_if(is.factor,as.character)
> 
> head(sd)
  X.SampleID SampleType
1        CL3       Soil
2        CC1       Soil
3        SV1       Soil
4    M31Fcsw      Feces
5    M11Fcsw      Feces
6    M31Plmr       Skin

My R version is 3.4.1, and I am using Rstudio version 1.1.453. I'm afraid I don't have any other ideas! The select is just grabbing specific columns out of the larger sample data dataframe. You can try another method to select the usable columns since select isn't working.

mawij2002 commented 6 years ago

I am using

library("phyloseq") packageVersion("phyloseq") [1] ‘1.24.2’ library("tidyverse") packageVersion("tidyverse") [1] ‘1.2.1’,

An the newest version of R with rstudio Version 1.1.383. It might be something about the versions.

There seem to be something about my function select that is off.

Thanks!

spholmes commented 6 years ago

You might try to find out if you have another package::select by using the package conflicted: https://cran.r-project.org/web/packages/conflicted/index.html

On Mon, Aug 13, 2018 at 12:09 PM, mawij2002 notifications@github.com wrote:

I am using

library("phyloseq") packageVersion("phyloseq") [1] ‘1.24.2’ library("tidyverse") packageVersion("tidyverse") [1] ‘1.2.1’,

An the newest version of R with rstudio Version 1.1.383. It might be something about the versions.

There seem to be something about my function select that is off.

Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/714#issuecomment-412630062, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvT18iM2YbkmUKQmMUor12o-C6uqpks5uQc7hgaJpZM4L8UM3 .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

mawij2002 commented 6 years ago

Apparently Mass and Dplyr functions Select clash. After doing this:

require(MASS) require(dplyr) [[https://stackoverflow.com/questions/24202120/dplyrselect-function-clashes-with-massselect]] It seems to work.

Now, I have doubts about the interpretation since for each of the d groups(facet) provides a boxplot of the distances of each d group. I think I was expecting a facet with boxplots per d groups comparisons. Do you mind helping me out with that? Thanks a million!

jeffkimbrel commented 6 years ago

Great suggestion Susan, thank you.

Can you rephrase what your looking for? It seems like this is still doing what you want... all pairwise distance comparisons between groups of samples?

spholmes commented 6 years ago

The package conflicts have been a bane and I am thankful for that package, we can't go forward without knowing the context though.

On Mon, Aug 13, 2018 at 1:03 PM, Jeff Kimbrel notifications@github.com wrote:

Great suggestion Susan, thank you.

Can you rephrase what your looking for? It seems like this is still doing what you want... all pairwise distance comparisons between groups of samples?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/714#issuecomment-412645195, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvT3Z4OLDxxpmZ4eKbcVjyiphxgL7ks5uQduRgaJpZM4L8UM3 .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

mawij2002 commented 6 years ago

I have a column called site, one season, and another one called type (two type groups), and for each site I have a distance TypeA-TypeB. I wanted the boxplot of the TypeA-TypeB distances (and if possible, by season).

Thanks so much!

jeffkimbrel commented 6 years ago

Can you just merge those columns to test what you want?

sd = sample_data(p)
sd$newColumn = paste0(sd$site, "_", sd$type)
sample_data(p) = sd

plotDistance(d = "newColumn")

Or, you may have to subset the phyloseq for each site, and then run plotDistance() using the type column.

mawij2002 commented 6 years ago

I see what you mean. Thanks!

I think I did not explain well, I have a column called site, one season, and another one called type (two type groups), and for each site I have a distance TypeA-TypeB. What I want is the boxplot of all site TypeA-TypeB distances. Not sure I can do that with your indicated approach

Thanks so much for your patience and help.

ps: The season part I can do by subsetting the object by season (and if possible, by season).

On Mon, Aug 13, 2018 at 5:41 PM, Jeff Kimbrel notifications@github.com wrote:

Can you just merge those columns to test what you want?

sd = sampledata(p) sd$newColumn = paste0(sd$site, "", sd$type) sample_data(p) = sd

plotDistance(d = "newColumn")

Or, you may have to subset the phyloseq for each site, and then run plotDistance() using the type column.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/714#issuecomment-412689660, or mute the thread https://github.com/notifications/unsubscribe-auth/AC3vcp793EpErbfyBurj1GDL5hHFXuTFks5uQgCXgaJpZM4L8UM3 .

--

"The voyage of discovery is not in seeking new landscapes but in having new eyes"

Juan Pedro Maestre Research Associate at The University of Texas at Austin. +1 512 751 96 81 Pronouns: he, him, his Curiosity + Science + Diversity + Empathy + Development + Community

jeffkimbrel commented 6 years ago

Can you print out the first few lines of your sample data?

mawij2002 commented 6 years ago

Jeff,

I think I do not understand what your function plots. For the specific case of the global patterns, is it plotting the distances among groups? Should all the facets be equal then? Sorry, trying to see how I can apply for my purpose. Thanks!

mawij2002 commented 6 years ago

Here, Thanks so much! Sorry the format comes out kind of weird.

sample_data(dust_comp_paired)[1:2, ] Sample Data: [2 samples by 272 sample variables]: SampleID X.SampleID BarcodeSequence LinkerPrimerSequence S13.W15.MUR.VA S13.W15.MUR.VA S13.W15.MUR.VA ACGTCTCCCCTT GTGYCAGCMGCCGCGGTA S15.W15.MUR.VA S15.W15.MUR.VA S15.W15.MUR.VA ACTACCTGCCGA GTGYCAGCMGCCGCGGTA ReversePrimer Sequencing.IdFungal Number.match S13.W15.MUR.VA TGARTTTMCTTAACYGCCCC W105 105 S15.W15.MUR.VA TGARTTTMCTTAACYGCCCC W106 106 Number.after.colon.in.label File Permanent.ID Run S13.W15.MUR.VA 107 W105.S107.L001.R1.001.fastq.gz 169 2 S15.W15.MUR.VA 108 W106.S108.L001.R1.001.fastq.gz 170 2 Run2 Sample.name.2 Season Type Permanent.ID.2 Replicate S13.W15.MUR.VA B S13.W15.MUR.VA Winter MUR.VA HH169
S15.W15.MUR.VA B S15.W15.MUR.VA Winter MUR.VA HH170
Remove_duplicates_not1 Duplicate_run Multi_comparison With_pre Site Control S13.W15.MUR.VA no S13 Sample S15.W15.MUR.VA no S15 Sample Sample_control Season_2 Type2 Date2 Field_1 Field_2 chao1_alpha S13.W15.MUR.VA Sample W15 MUR VA 1083.550 S15.W15.MUR.VA Sample W15 MUR VA 2853.787 chao1_normalized_alpha chao1_alpha_label shannon_alpha shannon_normalized_alpha S13.W15.MUR.VA 0.3443962 bin_2_of_4 5.458976 0.7449909 S15.W15.MUR.VA 0.9075692 bin_4_of_4 6.647368 0.9071717 shannon_alpha_label observed_species_alpha observed_species_normalized_alpha S13.W15.MUR.VA bin_3_of_4 765 0.3353819 S15.W15.MUR.VA bin_4_of_4 2026 0.8889377 observed_species_alpha_label Description sample_name site season S13.W15.MUR.VA bin_2_of_4 S13.W15.MUR.VA S13_W15_MUR_VA 13 Winter S15.W15.MUR.VA bin_4_of_4 S15.W15.MUR.VA S15_W15_MUR_VA 15 Winter site_season sample_loc_type sample_type aspergillus_flavus S13.W15.MUR.VA S13_W15 MUR_VA VA NA S15.W15.MUR.VA S15_W15 MUR_VA VA NA aspergillus_fumigatus aspergillus_niger aspergillus_ochraceus S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA NA NA NA aspergillus_penicillioides aspergillus_restrictus aspergillus_sclerotiorum S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA NA NA NA aspergillus_sydowii aspergillus_unguis aspergillus_versicolor S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA NA NA NA eurotium_asp__amstelodami aureobasidium_pullulans chaetomium_globosum S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA NA NA NA cladosporium_sphaerospermum paecilomyces_variotii penicillium_brevicompactum S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA NA NA NA penicillium_corylophilum penicillium_crustosumgroup2 penicillium_purpurogenum S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA NA NA NA penicillium_spinulosum penicillium_variabile scopulariopsis_brevicaulis S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA NA NA NA scopulariopsis_chartarum stachybotrys_chartarum trichoderma_viride S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA NA NA NA wallemia_sebi sum_of_logs_i acremonium_strictum alternaria_alternata S13.W15.MUR.VA NA NA NA NA S15.W15.MUR.VA NA NA NA NA aspergillus_ustus cladosporium_cladosporioides_i S13.W15.MUR.VA NA NA S15.W15.MUR.VA NA NA cladosporium_cladosporioides_ii cladosporium_herbarum epicoccum_nigrum S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA NA NA NA mucor_and_rhizopus_group penicillium_chrysogenum rhizopus_stolonifer S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA NA NA NA sum_of_logs_ii ermi endotoxins_eu_mg cat_fel_d1_ug_gram cockroach_bla_g1_u_gram S13.W15.MUR.VA NA NA NA NA NA S15.W15.MUR.VA NA NA NA NA NA dust_mite_der_fl_ug_gram dog_can_fl_ug_gram dust_mite_der_pl_ug_gram S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA NA NA NA mouse_mus_ml_ug_gram dust_type ses ses_group asthma_severity S13.W15.MUR.VA NA Settled 31 3 7 S15.W15.MUR.VA NA Settled 20 2 6 asthma_severity_group id child_age grade sex race ethnicity home S13.W15.MUR.VA Persistent S13 12 6 boy white hispanic manufactured S15.W15.MUR.VA Mild-intermittent S15 9 4 girl white hispanic manufactured marital mom_age edu_mom work_mom edu_dad work_dad sept oct nov dec jan feb mar S13.W15.MUR.VA married 37 4 3 5 4 0 0 0 1 0 0 0 S15.W15.MUR.VA married 32 2 1 3 4 0 1 1 0 0 1 1 april may june july aug symptoms woke_w_sympt limits cough S13.W15.MUR.VA 1 0 0 0 0 0-2 times/week 0-2 times/week 3-4/ month 0-2/ week S15.W15.MUR.VA 0 0 0 0 0 0-2 times/week 0-2 times/week no limits 0-2/ week runynose sore_throat sneeze smoke stufftoy carpet S13.W15.MUR.VA all the time 0-2/ week 0-2/ week no yes no S15.W15.MUR.VA more than twice/week more than 2 times/week 0-2/ week no yes yes vacuum general_health limit_high limit_moderate limit_low behavior_limits S13.W15.MUR.VA once a week good some a little some S15.W15.MUR.VA once a week good no limit no limit no limit no limit physical_limits pain argue not_focused lied S13.W15.MUR.VA some seldom sometimes almost never almost never S15.W15.MUR.VA no limit never almost never never almost never compared_to_others lonely nervous upset school friends S13.W15.MUR.VA good few times never never very satisfied very satisfied S15.W15.MUR.VA good never never few times satisfied satisfied life_overall less_healthy never_ill worry_about a_year_ago worry_physical S13.W15.MUR.VA very satisfied mostly false mostly false mostly true much better some S15.W15.MUR.VA very satisfied mostly true mostly false true much better never worry_emotional limit_you_physical limit_you_emotional limit_family_time S13.W15.MUR.VA some a little some sometimes S15.W15.MUR.VA never no limit no limit never limit_daily_activities get_along pet pet_0_1 dog cat other roaches S13.W15.MUR.VA never good yes yes inside inside never S15.W15.MUR.VA never good no no seldom dust x_of_rooms rooms_w_carpet clean_floors damp_stains S13.W15.MUR.VA every 2 weeks 4 4 once a week no S15.W15.MUR.VA every 2 weeks 0 0 every 2 weeks no stains_in_bedroom stains_there_for decide_on_filter change_filter cold_temp S13.W15.MUR.VA no 0-1 month tech recomend 1-6 month 65 S15.W15.MUR.VA no 0-1 month cheapest 1-6 month 77 hot_temp air_purifier avoid_allergies_outside avoid_allergies_inside avoid_chem S13.W15.MUR.VA 75 yes seldom never always S15.W15.MUR.VA 72 no always never sometimes decrease_humidity dustproof read_lables years_in_home adults children yr_built S13.W15.MUR.VA always always always 6 2 2 2006 S15.W15.MUR.VA always sometimes never 8 2 2 1980 flood fumes temp activities fear sick allergic seasonyear when height weight S13.W15.MUR.VA yes no no yes no no yes no NA 67.00 98.0 S15.W15.MUR.VA no no no no no no yes NA 56.25 91.8 bmi bmi ats fev1 fev1 fvc fvc pef pef fev1fvc fef2575 fef2575 bathe S13.W15.MUR.VA 15.3 10 y 2.93 80 3.23 81 6.66 86 91 3.24 82 shower S15.W15.MUR.VA 85 y 1.89 85 2.14 91 5.40 102 88 2.08 76 shower dob lungs language asthma_group carpet_fraction city occupants S13.W15.MUR.VA 4/30/02 normal eng asthma-ACT 0.50 Kyle 4 S15.W15.MUR.VA 07/29/04 normal span asthma-ACT 0.67 Kyle 4 occudens paved_access_road urbanized walkway_paved cultivation greenspace S13.W15.MUR.VA 0.001879699 yes yes yes yes yes S15.W15.MUR.VA 0.003555556 yes not no yes no sitemerge bact_genecopy_per_mg bact_cell_per_mg fung_genecopy_per_mg S13.W15.MUR.VA S13_W15_MU NA NA NA S15.W15.MUR.VA S15_W15_MU 143.3 20.5 NA fung_cell_per_mg runtime_heat_percent runtime_cool_percent temp_meas_type S13.W15.MUR.VA NA NA NA
S15.W15.MUR.VA NA NA NA
temp_c_mean_norm temp_c_mean_lognorm temp_c_sd_norm temp_c_sd_lognorm S13.W15.MUR.VA NA NA NA NA S15.W15.MUR.VA 23.2229 3.14446 0.8560653 0.03684515 rh_percent_mean_norm rh_percent_mean_lognorm rh_percent_sd_norm S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA 40.47076 3.663973 10.96526 rh_percent_sd_lognorm w_hr_g_kg_mean_norm w_hr_g_kg_mean_lognorm S13.W15.MUR.VA NA NA NA S15.W15.MUR.VA 0.272149 7.216904 1.931561 w_hr_g_kg_sd_norm w_hr_g_kg_sd_lognorm rhtype dmp dep dbp bbzp dehp dop tbp S13.W15.MUR.VA NA NA NA NA NA NA NA NA NA S15.W15.MUR.VA 2.147227 0.3024217 RET NA NA NA NA NA NA NA tcpp tbep tpp tdcpp st_human st_outdoor st_water st_unknown tglbehv trchq tpain S13.W15.MUR.VA NA NA NA NA 0.3917 0.4273 0.022 0.1590 60 60 80 S15.W15.MUR.VA NA NA NA NA 0.2890 0.4023 0.002 0.3067 60 60 100 tmpa tchq5 asthma_severity_yes_no runtime_total_percent Runtime_total_bin S13.W15.MUR.VA 33.33 33.33 Yes NA S15.W15.MUR.VA 100.00 100.00 Yes NA dehp_bin tpp_bin age_bin S13.W15.MUR.VA high_home_age S15.W15.MUR.VA low_home_age

jeffkimbrel commented 6 years ago

It is doing all pairwise distance comparisons between the group in the facet header, with all of the other groups (including itself). Each boxplot is shown twice, once in each group. For example, "mock" in the "soil" facet has the same data has "soil" in the "mock" facet.

Red is the within group comparisons, which is why it is typically lower distance compared to the black boxplots. There are some within group comparisons (for example in "feces") that are actually more distant than comparing across groups.

ozkurt commented 5 years ago

Hello. Thank you very much for the useful code.

Is there a way to calculate and add p-values for the pairwise comparisons to these plots?

dixi06 commented 5 years ago

Hello. Thank you very much for the useful code.

Is there a way to calculate and add p-values for the pairwise comparisons to these plots?

Hi This tutorial might be helpful for p-values

https://www.r-bloggers.com/add-p-values-and-significance-levels-to-ggplots/

shashankx commented 4 years ago

Hello, I was trying to work on this script. As I am able to run on the GlobalPatterns dataset, however, I am not sure how can I use it for my phyloseq.

Here is sample_data(phylo1) looks like-

SampleID SampleType Indivuals
Sample001 Left_Hand person1
Sample002 Left_Hand person2
Sample003 Left_Hand person3
Sample004 Left_Hand person4
Sample005 Right_Hand person1
Sample006 Right_Hand person2
Sample007 Right_Hand person3
Sample008 Right_Hand person4

I want to compare the distance between "person1 left hand" vs "person1 right hand" and so on. The objective is whether each person left-hand is closer to the right-hand microbiome than the other person's hands.

Any suggestions? Best, Shanky

BellaTsch commented 2 years ago

@jeffkimbrel Hi, I am trying to use your code and get the following error: Error: x must be a vector, not a object.

Any ideas what I need to fix?

Thanks in advance

jeffkimbrel commented 2 years ago

@BellaTsch, not sure, but it looks like I did update that code since I posted that version, so maybe try this one linked here:

https://github.com/jeffkimbrel/jakR/blob/master/R/plotDistances.R

I don't have a way to test it right now, but let me know if that doesn't seem to help. It still uses some older functions (like melt() so it could probably use a refresh to make it more "tidy".

BellaTsch commented 2 years ago

@jeffkimbrel Thanks for the link. I tried it and it runs successfully without any errors (maybe its set to silent?) BUT it doesn't display any output (no graph no nothing)... Here is the exact code: I replaced with my phyloseq object and couple of my variables... I renamed the ggplot "p" thinking it may be confused with the phyloseq object "p"... but stil no output. Any ideas? Thanks in advance!

' Boxplot of Beta-diversity Distances

'

' Takes a phyloseq object with samples grouped in the sample data, calculates all pairwise beta-diversity metrics, and plots the distances in facets.

'

' @param p A phyloseq object

' @param m Distance metric

' @param s Column name with sample IDs

' @param d Grouping for comparisons

' @param plot Set to true to include a boxplot of the distances

'

' @export

plotDistances = function(p = sbs_meta_filter_OTUs, m = "wunifrac", s = "Samples", d = "Site", plot = TRUE) {

require("phyloseq")
require("dplyr")
require("reshape2")
require("ggplot2")

# calc distances
wu = phyloseq::distance(p, m)
wu.m = melt(as.matrix(wu))

# remove self-comparisons
wu.m = wu.m %>%
  filter(as.character(Var1) != as.character(Var2)) %>%
  mutate_if(is.factor, as.character)

# get sample data (S4 error OK and expected)
sd = sample_data(p) %>%
  select(s, d) %>%
  mutate_if(is.factor,as.character)

# combined distances with sample data
colnames(sd) = c("Var1", "Type1")
wu.sd = left_join(wu.m, sd, by = "Var1")

colnames(sd) = c("Var2", "Type2")
wu.sd = left_join(wu.sd, sd, by = "Var2")

# plot
pdiv = ggplot(wu.sd, aes(x = Type2, y = value)) +
  theme_bw() +
  geom_point() +
  geom_boxplot(aes(color = ifelse(Type1 == Type2, "red", "black"))) +
  scale_color_identity() +
  facet_wrap(~ Type1, scales = "free_x") +
  theme(axis.text.x=element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  ggtitle(paste0("Distance Metric = ", m)) +
  ylab(m) +
  xlab(d)

# return
if (plot == TRUE) {
  return(pdiv)
} else {
  return(wu.sd)
}

}

jeffkimbrel commented 2 years ago

Hi @BellaTsch, I went ahead an updated the code in the jakR repo at https://github.com/jeffkimbrel/jakR/blob/master/R/plotDistances.R

It should work better now, and instead of returning a df or a plot, it returns both and they can be accessed via $df or $plot. FYI, if you use the code in my R package, I deprecated the plotDistances() in favor or plot_distance() simply because I am trying to move away from camel case.

library(phyloseq)
library(jakR)

data("GlobalPatterns")

a = plot_distances()

a$plot
a$df

I also fixed it up so the data frame columns use the names from the original data, and they are appended with _1 or _2.

> head(a$df) %>% knitr::kable()

|X.SampleID_1 |X.SampleID_2 |     value|SampleType_1 |SampleType_2 |
|:------------|:------------|---------:|:------------|:------------|
|CL3          |CC1          | 0.1720386|Soil         |Soil         |
|CL3          |SV1          | 0.2817016|Soil         |Soil         |
|CL3          |M31Fcsw      | 0.5772531|Soil         |Feces        |
|CL3          |M11Fcsw      | 0.5625487|Soil         |Feces        |
|CL3          |M31Plmr      | 0.4504863|Soil         |Skin         |
|CL3          |M11Plmr      | 0.3823127|Soil         |Skin         |

Hopefully this cleans up some of the issues!

BellaTsch commented 2 years ago

@jeffkimbrel I tried the full code and the outcome is the same, runs with no errors but no output either. Also, tried to install the package but it says it's not available for this version of R (I have the most recent one). I am not an R pro so I can't tell for sure if it's me or the code. Thanks

jeffkimbrel commented 2 years ago

@BellaTsch, it's probably the code. Can you send or share your phyloseq object?