pmelsted / bifrost

Bifrost: Highly parallel construction and indexing of colored and compacted de Bruijn graphs
BSD 2-Clause "Simplified" License
201 stars 25 forks source link

query output file query_name is incorrect #84

Open bredeson opened 3 months ago

bredeson commented 3 months ago

Hi @pmelsted, Thank you for this cool tool!

I was experimenting with the Bifrost query command line tool and noticed that if I specify a single sequence in the input query FASTA file to search a graph with two colors, the output file returns one query line with an empty query_name column; but if I input two query sequences in the query FASTA file, it outputs two query lines but with the second sequence name in place of the first query and the second query_name is empty. I think this must be a bug (the presence/absence counts are otherwise correct):

$ Bifrost build -k 31 -r ./genome.list --colors -o test
$ Bifrost query -q qry.fa -o qry-test -g test.gfa.gz -C test.color.bfg
$ cat query-test.tsv
query_name  genome1.fasta   genome2.fasta
qry2    1   1
    1   0

I'm using Bifrost v1.3.5 compiled on Linux. I've uploaded the test files to dataset2.tar.gz

Best, Jessen