Open jordenrabasco opened 2 years ago
I copied this from a previous tutorial as you mentioned you might want something similar in this tutorial. Looking at it there seems to be a lot of sample specific information in there. Do you think this would be too confusing for the user to change on their own? Additionally I am a bit confused about the purpose of the "PAD" variable in line 274, more specifically why it is multiplied in some cases and added in others.
Can you link the original code you are copying here? I think PAD
was related to the different length of the 16S primers used by the PacBio and LoopSeq libraries that were being compared?
I found the original code here: https://benjjneb.github.io/LRASManuscript/LRASms_fecal.html and it is commented below as well: ecm$SampleOrder <- df$SampleOrder[ecm$Sample] ecm$Timepoint <- paste("Timepoint", ecm$SampleOrder) ecm$Subject <- df$Subject[ecm$Sample] ecm$SubjectLabel <- paste("Subject", df$Subject[ecm$Sample]) ecmp <- ecm[ecm$Sample %in% c("R_3.1", "R_3.2", "R_3.3", "R_9.1", "R_9.1B", "R_9.2"),] ecmp$SequenceVariant <- as.character(ecmp$SequenceVariant) ecmp <- ecmp[ecmp$Abundance > 0,] xx.R3 <- c(Ec1=1, Ec2=2, Ec3=3, Ec4=4, Ec5=5) # Strain 1 PAD <- 8 # Between strain pad xx.R9 <- c(Ec2=1+PAD, Ec12=2+PAD, Ec13=3+PAD, Ec14=4+PAD, # Strain 2 Ec6=1+2PAD, Ec7=2+2PAD, Ec8=3+2PAD, Ec9=4+2PAD, Ec10=5+2PAD, Ec11=6+2PAD) # Strain 3 ecmp$X <- 0 is.R3 <- ecmp$Subject == "R3"; ecmp$X[is.R3] <- xx.R3[ecmp$SequenceVariant[is.R3]] is.R9 <- ecmp$Subject == "R9"; ecmp$X[is.R9] <- xx.R9[ecmp$SequenceVariant[is.R9]]
dflim <- data.frame(X=c(1, 1), Abundance=c(3000, 1500),
SequenceVariant=c("Ec1", "Ec1"),
SubjectLabel=c("Subject R3", "Subject R9"), Timepoint=c("Timepoint 1", "Timepoint 1"))
p.ecoli <- ggplot(data=ecmp, aes(x=X, y=Abundance, color=SequenceVariant)) + geom_point() +
facet_grid(SubjectLabel~Timepoint, scales="free_y") +
xlab("E. coli ASVs") + theme(axis.ticks.x=element_blank(), axis.text.x = element_blank()) +
theme(panel.grid.major.x=element_blank(), panel.grid.minor.x=element_blank()) +
geom_blank(data=dflim) +
guides(color=FALSE)
p.ecoli
Ah, PAD is being used to visually separate the ASVs associated with one strain from those associated with another strain.
Yes, this code is too complicated and needs to be simplified to be useful in a tutorial. We don't need a publication-ready plot in the tutorial, it just needs to be interpretable, and its construction straightforward to understand.
updated
https://github.com/jordenrabasco/Long_read_processing_tutorial/blob/ea3b5b0091a676c24cabe994fad379d8c092f52b/long%20read%20Tutorial.Rmd#L265-L291