Open mbhall88 opened 5 years ago
I honestly am hesitant to say as it could affect the results a bit. I have yet to test this on nanopore data where read lengths are so varied.
My gut says minimum read length. but I really would like to test this further before being certain.
@jenniferlu717
I'm currently facing a similar issue with IIumina HiSeq NGS read data with varied read length of 30-301Aa after QC (trimmomatic followed by FASTQC). Is this issue resolved or still in the face of some development. I could see in the readme.md, bracken easy version has some ways to tackle reads with multiple read length (see link below). If this suits my requirement please confirm. https://github.com/jenniferlu717/Bracken#running-bracken-easy-version
Thanks in advance ,
Regards, Vijay N
Facing the same problem here - we have a variety of sequencers generating anything from 150bp to 5kb (PacBio). I'm tempted to create two databases so that I can do chemistry-dependent analyses, but if the 150bp db would work for the longer reads, well, it would simplify handing this off to other folks. Any update on your tests, @jenniferlu717 ?
From the paper we can know, length r is used to generate a database that the length of kmers is r, which is equal to the read length, then we can know how many k-mers are unique to genome Si. I am facing the same problems with you, but i still don't konw how to solve r, my read is from 150 to 300bp.
@jenniferlu717
I'm currently facing a similar issue with IIumina HiSeq NGS read data with varied read length of 30-301Aa after QC (trimmomatic followed by FASTQC). Is this issue resolved or still in the face of some development. I could see in the readme.md, bracken easy version has some ways to tackle reads with multiple read length (see link below). If this suits my requirement please confirm. https://github.com/jenniferlu717/Bracken#running-bracken-easy-version
Thanks in advance ,
Regards, Vijay N
Have you solve this problem?
Hi @jenniferlu717,
Similarly to the other folks posting here, I was wondering about what kind of read length I should build a database for. I'm analyzing a fairly diverse dataset where reads are 45, 75, or 100 bp long. Additionally, I will have to trim some of the reads even further due to poor quality. Do you recommend preparing and using different databases or one database based on the minimum length?
Thank you for your insights!
I've been thinking a bit more about this and I'm actually wondering if Bracken is needed at all for long reads. I wonder if someone here has more experience because I would assume that with the long reads, kraken2 can match them quite specifically to one of the reference genomes. So I wonder if there is a need even to post-process with Bracken.
I also have a diverse dataset with multiple read lengths. I'm thinking of setting to the minimum, but would appreciate any guidance.
Please can I ask if you have had a chance to test this @jenniferlu717? It would be great to know if we can use Bracken with confidence for nanopore.
Many thanks,
Jack
Hello @Midnighter, I wonder if you have any updates on this issue. I am analysing nanopore 16s data (minION) and already classified them with Kraken2. Is further processing with Bracken necessary? If yes, is the mimimum read length the optimal choice? If not, how would one calculate the relative taxonomic abundance with Kraken2 output?
Many thanks in advance!
I don't have a real answer but I can say that we decided to not run Bracken on nanopore reads for taxprofiler.
Upfront, I know Bracken wasn't necessarily designed to run on nanopore data.
For the read length parameter how would you recommend setting this? Median read length, average, minimum (as in #30 ), or a hard threshold?