jkbonfield / fqzcomp

Fastq compression tool
14 stars 0 forks source link

PacBio HiFi error #4

Open kalimero0411 opened 3 years ago

kalimero0411 commented 3 years ago

I used fqzcomp -n2 -s4+ -q1 on PacBio HiFi (Sequel II CSS) fastq sequences and recieved (fqzcomp v4.6):

Quality scale is too high. This looks like Illumina+64 format. Try rerunning with the -I option instead.

The unique charaters in the quality scores of the first 10 sequences are:

` ^ ~ < = > | _ - , ; : ? / . ' ( ) [ ] { } @ $ * \ & % + 0 1 2 3 4 5 6 7 8 9 a A b B c C d D e E f F g G h H i I j J k K l L m M n N o O p P q Q r R s S t T u U v V w W x X y Y z Z

The only characters missing are 0,1 and 2:

! " #

Is there an a way to compress this data with fqzcomp? Thank you

jkbonfield commented 3 years ago

Try editing the QMAX #define in fqz_comp.c to 128 instead of 64.

It could auto-detect, but fqzcomp was written in an era of Illumina data and it's not really a production worthy tool (more of a research project).

I have updated the fqzcomp models for CRAM 3.1 so I really ough to backport that work here and release a new fqzcomp.