Closed eparejatobes closed 8 years ago
@eparejatobes @rtobes this is the information I have found about the pacbio datasets we have (it's a mail from Richard Hall). Unfortunately I don't find the link to the ID from the BEI Catalog
- The sequence of the primers used to do the amplicons P1 AGRGTTYGATYMTGGCTCAG P2 RGYTACCTTGTTACGACTT
- The filtering and preprocessing protocols (correction of CCSs, trimming,....) and the used >parameters
I applied no further filtering than basic CCS parameters, 3 passes of the insert, 0.9 predicted >accuracy. I can provide data further filtered for predicted accuracy, you can also achieve the >same filtering using the base QV values.- Why are some reads very much larger than the length of the amplicon? It is possible for chimeras to form between amplicons, or in a small percentage of cases an >adapter is missing on one side, forming a palindromic insert sequence. A simple length filter >should remove these reads, without adversely effecting yield.
- What are exactly the mock communities that we have? BEI - http://downloads.hmpdacc.org/data/HMMC/HMPRP_sT1-Mock.pdf the amplicons are >generated from the even and staggered genomic samples. Sakinaw is a real environmental sample. CAMI - is from https://data.cami-challenge.org/participate I'm not sure exactly which sample it >is, maybe Cheryl knows?
- Are the quality values phred33-encoded quality values? Yes phred scores are standard sanger format "from 0 to 93 using ASCII 33 to 126"
I can also provide data filtered using my 16S pipeline, if you are interested in a cleaner dataset?
The pdf come from here:
And there the IDs for BEI DNA are these:
From the hmpdacc website
Mock communities are available to the community through the BEI Resource as both a cell mixture (BEI:HM-280, HM-281) and a genomic DNA extract (BEI:HM-278D, HM-279D).
OK thank you, I'll take a look at all this later today.
Waiting for #86
The code is here, together with the mock communities. We need to review where all test input data is, and then I will fix the input data mappings etc.
LGTM
Check https://github.com/era7bio/mg7-test/blob/master/docs/pacbio-mock-tests.md