I have noticed that in one case counts are not assigned to the correct OTU and a species is missing.
In my Kraken report, I have the following lines
25.56 211687 **3** G 10509 Mastadenovirus
25.56 **211678** 124056 S 129951 Human mastadenovirus C
Indicating three reads to the Mastadenovirus genus and 211,678 (including subspecies level) to mastadenovirus C. kraken-biom correctly makes the .biom file, with data:
...[2095,0,3.0],[2096,0,211678.0]...
And I confirm that the 2095th and 2096th (0-offset) elements of rows is:
However, MEGAN6 6.12.5 assigns 211,687 reads to Mastadenovirus and intriguingly, I cannot even uncollapse Mastadenovirus to reveal Human mastadenovirus C.
Nonetheless, Neisseria sicca comes out fine:
30.54 252935 6038 G 482 Neisseria
22.06 182723 182723 S 490 Neisseria sicca
182,731 reads to the species and 6038 to the genus. This is again correctly recorded in the Biom:
[2,0,6038.0],[3,0,182723.0]
Where elements 2 and 3 (0-offset) are indeed the pair we want:
I have attached the file in case this helps understand the problem. I have also confirmed the assignments are correct when I read the biom file into R with the biomformat package.
I am importing Biom files made from Kraken reports (using: https://github.com/smdabdoub/kraken-biom)
I have noticed that in one case counts are not assigned to the correct OTU and a species is missing.
In my Kraken report, I have the following lines
Indicating three reads to the Mastadenovirus genus and 211,678 (including subspecies level) to mastadenovirus C. kraken-biom correctly makes the .biom file, with data:
And I confirm that the 2095th and 2096th (0-offset) elements of rows is:
However, MEGAN6 6.12.5 assigns 211,687 reads to Mastadenovirus and intriguingly, I cannot even uncollapse Mastadenovirus to reveal Human mastadenovirus C.
Nonetheless, Neisseria sicca comes out fine:
182,731 reads to the species and 6038 to the genus. This is again correctly recorded in the Biom:
Where elements 2 and 3 (0-offset) are indeed the pair we want:
I have attached the file in case this helps understand the problem. I have also confirmed the assignments are correct when I read the biom file into R with the biomformat package.
exemplar_biom.txt
Many thanks,
Andrew