Closed maxibor closed 6 years ago
I think I found the source of this issue:
In the basta/TaxTree.py
file, in the method _get_known_strings()
of the class TTree
It seems that you originally planned to include the species, but eventually decided not to...
# remove species
def _get_known_strings(self,string):
# ts = string.split(";")[:-2] #original code
ts = string.split(";")[:-1] #modified version
return ts
Changing string.split(";")[:-2]
to string.split(";")[:-1]
solves the issue and allows to recover the specie. I can make a PR if you want, but I feel like it was a development choice...
You're right, I wanted to limit false "very specific" assignments at one point but don't remember why. There might also be something about downstream problems for species "unknown". Will have to look into it.
I think the reason was that in previous versions I removed taxonomy strings with "unknown" in it. However, there are quite a few taxa with only species level "unknown" so I decided to remove the species and be fine with it. However, I'm not removing those strings anymore anyways so I added the species again ...
Dear @timkahlke , I've been trying out BASTA on simulated data, however, I can never get down to the specie level: Here is an example of my blast output:
For the sequence
tmp20
, there is only one hit, so I should be able to go down the specie level, since the full taxonomic lineage is known for NC_035317.1 However, BASTA only goes to the genus level:Here is the basta command line I used: