Open Rohit-Satyam opened 8 months ago
I looked back into the code and what Bracken is trying to do is redistribute reads based on the kmer distribution file.
So for example, if any sequence (100bp for example) from Brassica Rapa is classified as Bacteria or Acinetobacter, then reads from those levels were redistributed to Brassica Rapa.
Are you using a complete genome of Brassica Rapa?
What I could do is require that any redistribution has to remain in the same tree. This normally doesn't cause problems unless genomes are contaminated.
Hi @jenniferlu717. My data is RNAseq (not genomes) but the read length is greater than 100 post trimming and I provide these trimmed fastq files to Kraken2. All I am using is Kraken2/ Bracken index PlusPF from here (Downloaded on Dec 8, 2023).
What I could do is require that any redistribution has to remain in the same tree. This normally doesn't cause problems unless genomes are contaminated.
Yep, that would be great. In addition, can you tell why there are missing taxa for the Bacteria in the Sankey plot (Bacteria directly linked to Pseudomonadota)? I observe that such a discrepancy has been reported in Pavian issue threads (see here and here) but this must be stemming from the Kraken2 reports right? This wasn't the issue with the previous indexes.
Hi, I was wondering if this is a quick fix and can be inculcated by the end of this week or if I have to wait for a few months. kindly let me know because I have a presentation coming so I would be able to include/exclude results accordingly. No hurry please take your time. I just want to know for my personal goals.
Thanks
Hi @jenniferlu717 I was wondering if this issue was fixed.
Dear @jenniferlu717
I have been using Bracken for a long time but recently something changed. I have clinical COVID RNAseq samples (380) and I realized that in many samples Bracken shows the abundance of Brassica Rapa and other Plant-related reads such as Juglans regia. This is completely opposed to what we observe in Kraken2 results. Besides, these are human samples so I shouldn't observe Viridiplantae reads. I see that a similar misassignment has also been reported elsewhere
Version Used: Bracken v2.9
Also attaching the results:
bracken_misassignment.zip