jfmrod / MAPseq

highthroughput rRNA sequence classification and OTU mapping for metagenomic samples
Other
24 stars 7 forks source link

14922 sequences not found in sequence database #20

Open hjruscheweyh opened 3 years ago

hjruscheweyh commented 3 years ago

Hi Joao

I was running mapseq with version 1.2.6 after downloading it from the binaries section in github and I got the following warning.

Is this an issue?

Best and thanks, Hans

mapseq-1.2.6-linux/mapseq test.fasta -nthreads 8 -tophits 80 -topotus 40 > test.mseq
# loaded 1521928 sequences
!! Thu Mar 25 10:01:23 2021 [] mapseq.cpp:3614 void load_taxa(const estr&, eseqdb&): loading taxonomy, 14922 sequences not found in sequence database
# taxonomies: 2
# tax levels: 7
# tax: 0 level: 0 (4)
# tax: 0 level: 1 (158)
# tax: 0 level: 2 (331)
# tax: 0 level: 3 (653)
# tax: 0 level: 4 (1168)
# tax: 0 level: 5 (2855)
# tax: 0 level: 6 (8912)
# tax levels: 6
# tax: 1 level: 0 (3)
# tax: 1 level: 1 (33361)
# tax: 1 level: 2 (94152)
# tax: 1 level: 3 (120766)
# tax: 1 level: 4 (161800)
# tax: 1 level: 5 (241196)
# fcount: 50 otukmercount: 261838
# processing input... 5000
# done processing 5000 seqs (1.68232s)
jfmrod commented 3 years ago

Hi Hans!

No, that is not a problem. This appears because some of reference sequences from mapref-2.2 to mapref-2.2b were removed and at the time I didn't update the taxonomy files to remove the taxonomies since it would not have an effect on the results (except for the warning).

I realize it is confusing and should be fixed so thanks for reporting it.

I will update the reference soon to include the most up to date taxonomy from NCBI and will fix this soon.