microbiome / OMA

Orchestrating Microbiome Analysis
https://microbiome.github.io/OMA
86 stars 42 forks source link

Beta diversity plot examples #111

Open antagomir opened 3 years ago

antagomir commented 3 years ago

Add an example showing how pairs of points can be connected on beta diversity map.

antagomir commented 1 year ago

Was this done already, I have some memories..?

antagomir commented 8 months ago

@jepasan could you make this as a warm-up? Just suggest one solution first, we can take it from there. Make a PR to OMA beta diversity chapter. At least the "hitchip1006" demo data set from miaTime package has 1000+ individuals, and multiple time points for some of those individuals. So the beta diversity plot (PCoA) can be done for the 1000+ individuals, then connect points with lines for one or more individuals on that map. If unclear, we can chat more in Slack.

Daenarys8 commented 4 weeks ago

Can this be sufficient?

library(scater)
library(dplyr)
library(miaTime)
data(hitchip1006, package = "miaTime")
tse <- hitchip1006
tse <- transformAssay(tse, method = "relabundance")
tse <- runMDS(tse, FUN = vegan::vegdist, method = "bray", name = "PCoA_BC",
              assay.type = "relabundance", na.rm = TRUE)

meta_data <- as.data.frame(colData(tse))

print(colnames(meta_data))  

pcoa_coords <- as.data.frame(reducedDim(tse, "PCoA_BC")) 
pcoa_coords$subject <- meta_data$subject
pcoa_coords$SampleID <- meta_data$SampleID
pcoa_coords$timee <- meta_data$`time`  

subjects_with_multiple_entries <- meta_data %>%
    group_by(subject) %>%
    summarize(num_entries = n(), .groups = "drop") %>%
    filter(num_entries > 1) %>%
    pull(subject)

subset_data_by_subjects <- meta_data %>%
    filter(subject %in% subjects_with_multiple_entries)

merged_data <- subset_data_by_subjects %>%
    inner_join(pcoa_coords, by = "subject") %>%
    arrange(subject, `timee`)

# Plot with SampleID in legend mapped to subject
p <- ggplot(merged_data, aes(x = V1, y = V2)) + 
    geom_point(aes(color = as.factor(sample)), alpha = 0.6, size = 2) + 
    geom_line(aes(group = subject, color = as.factor(sample)), size = 0.7, alpha = 0.5) +  
    labs(
        title = "PCoA with Bray-Curtis dissimilarity",
        x = "PC1",
        y = "PC2",
        color = "Sample ID"
    ) +
    theme_minimal() +
    theme(
        legend.position = "bottom",
        legend.text = element_text(size = 6),        
        legend.title = element_text(size = 7),      
        legend.key.size = unit(0.4, "cm"),           
        legend.spacing.x = unit(0.1, 'cm'),        
        legend.spacing.y = unit(0.1, 'cm')           
    ) +
    guides(color = guide_legend(nrow = 4, byrow = TRUE)) 

Rplot-samples

Daenarys8 commented 4 weeks ago

Perhaps just for 5-10 individuals?

TuomasBorman commented 4 weeks ago

Too complex for training purpose. Try this idea:


p <- plotReducedDim(tse, colour_by = "patient", ...)
p + geom_line(aes(group = patient))
Daenarys8 commented 3 weeks ago

p <- plotReducedDim(tse, colour_by = "patient", ...) p + geom_line(aes(group = patient))

This also works but the line just goes through all the points. I am looking to see if just points with multiple sample-timepoints can get distinct lines passing through them Rplotline

TuomasBorman commented 3 weeks ago

My code was just an example, I did not test it. Does this work in your case

data(hitchip1006, package = "miaTime")
tse <- hitchip1006
tse <- tse[, 1000:1020]
tse <- transformAssay(tse, method = "relabundance")
tse <- runMDS(tse, FUN = getDissimilarity, method = "bray", assay.type = "relabundance", na.rm = TRUE)

p <- plotReducedDim(tse, "MDS", colour_by = "subject")
p + geom_line(aes(group = .data[["colour_by"]]))
Daenarys8 commented 3 weeks ago

My code was just an example, I did not test it. Does this work in your case

data(hitchip1006, package = "miaTime")
tse <- hitchip1006
tse <- tse[, 1000:1020]
tse <- transformAssay(tse, method = "relabundance")
tse <- runMDS(tse, FUN = getDissimilarity, method = "bray", assay.type = "relabundance", na.rm = TRUE)

p <- plotReducedDim(tse, "MDS", colour_by = "subject")
p + geom_line(aes(group = .data[["colour_by"]]))

Yes, this links the required points perfectly. thanks

TuomasBorman commented 3 weeks ago

Can you check if you can add arrows that point from timepoint 1 to timepoint 2....

That might be more tricky, might require incorporating additional data that is not already present in the object.

ggplot_build() function can help you to check the data present

TuomasBorman commented 3 weeks ago

Or just use shape_by for time points and then use it similarly as colour_by

That might be better and simpler approach

Daenarys8 commented 3 weeks ago

for now i have this but i don't think it is representing the flow of time, I will explore a little bit. Rplot01arrow

antagomir commented 3 weeks ago

I think (not sure) that time points are plotted in the order that they occur in the data. If you first sort samples by time as a processing step, then do the plotting, perhaps it works without extra tricks?