vrmarcelino / CCMetagen

Microbiome classification pipeline
GNU General Public License v3.0
64 stars 19 forks source link

Database headers #63

Closed vinisalazar closed 4 months ago

vinisalazar commented 8 months ago

The new update database does not have taxonomic lineages in the header, only accession + taxid. Investigate whether that is a problem with the upstream database formatting scripts (e.g. rename_nt.py) or something else.

TO-DO:

ramacleod commented 4 months ago

Hi, I just ran KMA with the prebuilt refseq database and found that the resulting taxid order seems to be incorrect in the res file, e.g. ' NC_007508.1|taxid|316273 ' instead of ' 316273|NC_007508.1 ', which then throws an error if you just run CCMetagen.py directly on that. I think I can fix this just using awk on the res file, but thought I'd just let me know if this was connected to the issue above. Otherwise, I can provide a lot more details of what I did.

vrmarcelino commented 4 months ago

Hi!

Thanks for the info! Could you try running CCMetagen with the flag -r RefSeq or --reference_database RefSeq ? This flag was done to take care of the different heading formats of the RefSeq database, but let me know if it works.

ramacleod commented 4 months ago

Ah, should have read the manual! That works, as did my awk bodge yesterday. Thanks.

vrmarcelino commented 4 months ago

I am glad it works =)