samtools / htslib

C library for high-throughput sequencing data formats
Other
785 stars 447 forks source link

Fix decompress_peek_gz to cope with files starting on empty gzip blocks. #1643

Closed jkbonfield closed 12 months ago

jkbonfield commented 1 year ago

Demonstration:

$ echo -e "@seq\nAAAAA\n+\nABCDE" | gzip > _normal.gz                           
$ (gzip < /dev/null;cat _normal.gz) > _empty.gz                                 

$ ./htsfile _normal.gz                                                          
_normal.gz:     FASTQ gzip-compressed sequence data                             

$ ./htsfile _empty.gz                                                           
_empty.gz:      empty gzip-compressed data                                      
$ ./test/test_view _empty.gz                                                    
Unsupported or unknown category of data in input file                           

Zlib's inflate ends with Z_STREAM_END and then repeated calls to get more won't give any difference, so we need the explicit reset to move forwards with the next block.

jkbonfield commented 1 year ago

Now with added paranoia for loop detection. Just incase...