radareorg / radare2

UNIX-like reverse engineering framework and command-line toolset
https://www.radare.org/
GNU Lesser General Public License v3.0
20.62k stars 3k forks source link

Support for BGZF (gzip blocked format) via the gzip plugin #11720

Open brainstorm opened 6 years ago

brainstorm commented 6 years ago

For context, please read what BGZF is about first.

Also, for a good BGZF sample file to reproduce and fix this issue, you can download this one, the hg38 revision of the human genome.

Work environment

$ r2 -v full
radare2 3.0.0-git 19597 @ darwin-x86-64 git.1.0.2-6642-g237e6c294
commit: 237e6c2947599c90a1dc76b986044e642d270386 build: 2018-10-03__14:43:22

Expected behavior

$ r2 gzip://test.txt.gz should open the bgzip-compressed file containing more than one block as any other gzip file. So creating a short/small text and compressing it with bgzip should span over more than one block to see the problem presented here.

Actual behavior

$ r2 gzip://hg38.fa.gz
Cannot allocate (38.fa.gz) 0 byte(s)
[r] Cannot open 'gzip://hg38.fa.gz'
$ r2 -zzz gzip://hg38.fa.gz
Cannot allocate (38.fa.gz) 0 byte(s)
[r] Cannot open 'gzip://hg38.fa.gz'
brainstorm commented 6 years ago

Smaller BGZF files can be found on the biopython test suite and a good (python) reference implementation is in the same codebase.

radare commented 5 years ago

make an io plugin in python or find an implementation in C

brainstorm commented 4 years ago

This C implementation is the oldest and most well supported in bioinfo:

https://github.com/samtools/htslib/blob/develop/bgzf.c