Open jordenrabasco opened 2 years ago
If for any reason you need to stop the tutorial, the R object can then be loaded in and the workflow continued from this step.
What does this mean exactly? And, would a tutorial reader know how to perform this action?
As we can see the list of unique sequences and their counts were generated. We can also see that there is a significant relative abundance of these unique sequences. This allows the "learnerrors" module to run appropriately. If there wasn't enough abundance for each unique amplicon then the error model wouldn't run correctly. However, this doesn't seem to be the case here and we can therefore assume that the dereplicaiton procedure was a success! If you wish to check the other samples you can switch the sample name in the code "head(drp$R11_1_P3C3.fastq.gz$uniques)" to whichever sample you would like to view.
This needs to be substantially revised.
"If for any reason you need to stop the tutorial, the R object can then be loaded in and the workflow continued from this step."
I think this was left over from before I split the dereplicaiton and the error plots section. I have since move that sentence to the error plot section of the tutorial.
"This needs to be substantially revised."
I have rewritten the section substantially by adding descriptions of the r object generated from the deprecliation function as how it is structured. I also provide instruction on how to investigate the object further. I removed the part of this section talking about the error modules to avoid confusion and separate the sections more fully. Let me know what you think of the new version! It should be submitted now in its own git push
Can you link the commit?
Also, a useful feature is that you can include issue numbers in commit messages, and they will be automatically linked into that issue. E.g. if you had included addresses #8
in your commit message, a link to that commit would show up in this issue thread automatically.
Ah okay I will do that next time! The commit should be linked here: 22bc885af601d46d5b39f3b08f7a9067461baa25
As we can see the list of unique sequences and their associated counts were generated appropiratly (sic).
How would a new user know whether the output above shows that unique sequences were generated appropriately? This is important, because for long reads in particular the dereplication step is an important sanity check that the data is appropriate for DADA2. Explanation of why that is, and what "appropriate" and "not appropriate for DADA2" outputs would be is needed here.
Additionally, if for any reason you need to stop the tutorial, the saved R object can then be loaded in and the workflow continued from this step.
How would one do this?
outputs needed here do you mean just example tables or something more substantial?
outputs needed here do you mean just example tables or something more substantial?
Example outputs aren't even needed probably. A description of how to interpret the output of dereplication, that makes it clear when the output suggests things are OK, and when it isn't OK.
Okay cool the updated commit linked above should have those changes
I don't think a user could interpret that text to their own data. Let's say someone w/o familiarity with DADA2 runs the derepFastq
step, and gets 1000 unique sequences out of their 1024 reads. What does that mean? Can they decipher what that means from the current text?
updated
This is the introduction to the dada2 method and an explanation and sanity check of the dereplication. Let me know what you think!
https://github.com/jordenrabasco/Long_read_processing_tutorial/blob/afa1a962b305b79b0473a644cd9133a992bfa9ea/long%20read%20Tutorial.Rmd#L121-L127