jordenrabasco / Long_read_processing_tutorial

2 stars 0 forks source link

[Section] - Graph of ASVs overtime #21

Open jordenrabasco opened 2 years ago

jordenrabasco commented 2 years ago

https://github.com/jordenrabasco/Long_read_processing_tutorial/blob/ea3b5b0091a676c24cabe994fad379d8c092f52b/long%20read%20Tutorial.Rmd#L265-L291

jordenrabasco commented 2 years ago

I copied this from a previous tutorial as you mentioned you might want something similar in this tutorial. Looking at it there seems to be a lot of sample specific information in there. Do you think this would be too confusing for the user to change on their own? Additionally I am a bit confused about the purpose of the "PAD" variable in line 274, more specifically why it is multiplied in some cases and added in others.

benjjneb commented 2 years ago

Can you link the original code you are copying here? I think PAD was related to the different length of the 16S primers used by the PacBio and LoopSeq libraries that were being compared?

jordenrabasco commented 2 years ago

I found the original code here: https://benjjneb.github.io/LRASManuscript/LRASms_fecal.html and it is commented below as well: ecm$SampleOrder <- df$SampleOrder[ecm$Sample] ecm$Timepoint <- paste("Timepoint", ecm$SampleOrder) ecm$Subject <- df$Subject[ecm$Sample] ecm$SubjectLabel <- paste("Subject", df$Subject[ecm$Sample]) ecmp <- ecm[ecm$Sample %in% c("R_3.1", "R_3.2", "R_3.3", "R_9.1", "R_9.1B", "R_9.2"),] ecmp$SequenceVariant <- as.character(ecmp$SequenceVariant) ecmp <- ecmp[ecmp$Abundance > 0,] xx.R3 <- c(Ec1=1, Ec2=2, Ec3=3, Ec4=4, Ec5=5) # Strain 1 PAD <- 8 # Between strain pad xx.R9 <- c(Ec2=1+PAD, Ec12=2+PAD, Ec13=3+PAD, Ec14=4+PAD, # Strain 2 Ec6=1+2PAD, Ec7=2+2PAD, Ec8=3+2PAD, Ec9=4+2PAD, Ec10=5+2PAD, Ec11=6+2PAD) # Strain 3 ecmp$X <- 0 is.R3 <- ecmp$Subject == "R3"; ecmp$X[is.R3] <- xx.R3[ecmp$SequenceVariant[is.R3]] is.R9 <- ecmp$Subject == "R9"; ecmp$X[is.R9] <- xx.R9[ecmp$SequenceVariant[is.R9]]

Force desired facet ymax limits with a custom data.frame

dflim <- data.frame(X=c(1, 1), Abundance=c(3000, 1500),
SequenceVariant=c("Ec1", "Ec1"), SubjectLabel=c("Subject R3", "Subject R9"), Timepoint=c("Timepoint 1", "Timepoint 1")) p.ecoli <- ggplot(data=ecmp, aes(x=X, y=Abundance, color=SequenceVariant)) + geom_point() +
facet_grid(SubjectLabel~Timepoint, scales="free_y") + xlab("E. coli ASVs") + theme(axis.ticks.x=element_blank(), axis.text.x = element_blank()) + theme(panel.grid.major.x=element_blank(), panel.grid.minor.x=element_blank()) +
geom_blank(data=dflim) + guides(color=FALSE) p.ecoli

benjjneb commented 2 years ago

Ah, PAD is being used to visually separate the ASVs associated with one strain from those associated with another strain.

Yes, this code is too complicated and needs to be simplified to be useful in a tutorial. We don't need a publication-ready plot in the tutorial, it just needs to be interpretable, and its construction straightforward to understand.

jordenrabasco commented 2 years ago

updated