marbl / meryl

A genomic k-mer counter (and sequence utility) with nice features.
115 stars 13 forks source link

Homopolymer compression is not applied if the first read file is empty #31

Open maickrau opened 1 year ago

maickrau commented 1 year ago

Running count compress with multiple read files and an empty file as the first file does not apply homopolymer compression. The following command creates an index without homopolymer compression:

meryl count compress k=21 threads=4 memory=32g empty.fa reads.fa output kmers_withempty

But putting the empty file as the not first file will correctly create a homopolymer compressed index:

meryl count compress k=21 threads=4 memory=32g reads.fa empty.fa output kmers_withempty2

meryl print shows the first file is not homopolymer compressed but the second is:

$ meryl print kmers_withempty/ | head

Found 1 command tree.

PROCESSING TREE #1 using 1 thread.
  opLessThan
    kmers_withempty/
    print to (stdout)
AAAAAAAAAAAAAAAAATAAG   1
AAAAAAAAAAAAAAAACTACA   1
AAAAAAAAAAAAAAAATAAGG   1
AAAAAAAAAAAAAAACAATAC   1
AAAAAAAAAAAAAAACTACAG   1
AAAAAAAAAAAAAAATAAGGA   1
AAAAAAAAAAAAAACAATACT   1
AAAAAAAAAAAAAACTACAGA   1
AAAAAAAAAAAAAATAAGGAG   1
AAAAAAAAAAAAAAGTACTTT   1

$ meryl print kmers_withempty2 | head

Found 1 command tree.

PROCESSING TREE #1 using 1 thread.
  opLessThan
    kmers_withempty2/
    print to (stdout)
ACACACACACACACACTACTA   1
ACACACACACACACTACTACT   1
ACACACACACACATCATATAC   1
ACACACACACACTACAGACAT   1
ACACACACACACTACAGATCA   1
ACACACACACACTACTACTAC   2
ACACACACACATCATATACAG   1
ACACACACACTACAGACATCA   1
ACACACACACTACAGATCATC   1
ACACACACACTACTACTACTA   4

$ meryl --version
meryl snapshot v1.4-development +29 changes (r969 97d5923dd69ebc3efed67fc466c21ed8c5e6670b)
brianwalenz commented 1 year ago

Thanks, Mikko. It's not just an empty first file that causes trouble. The 'compress' flag is reset after EACH file. The workaround is simple but annoying: add 'compress' before each input file.

I remember debating if this flag should be reset or not. I'm a little embarrassed I left it in.