samtools / htslib

C library for high-throughput sequencing data formats
Other
804 stars 446 forks source link

Fix possible heap overflow in cram_encode_aux() on bad RG:Z tags #1737

Closed daviesrob closed 8 months ago

daviesrob commented 8 months ago

RG:Z tags without a proper NUL termination could lead to use of invalid data, or a heap overflow when the tag is passed to sam_hrecs_find_rg(), or hts_log_warning() if the former returns NULL. Fix by moving the line that skips to the end of the aux tag and then checking that it was terminated correctly. Should it not be, the aux parser is reset so the tag can be stored verbatim (the code that does that already handles badly-terminated Z tags).

Credit to OSS-Fuzz Fixes oss-fuzz 66369

jkbonfield commented 8 months ago

The change looks good, but I'm not sure it's sufficient.

It fixes the fuzzing problem, but it still generates a broken aux record as it doesn't add the nul termination when storing the RG verbatim. This then means we convert from BAM to CRAM without an error (just a warning), but produce a CRAM which cannot be decoded:

$ ./test/test_view /tmp/_.cram 
[E::sam_format1_append] Corrupted aux data for read   
Error writing output.

I'm tempted to say we should just short-cut the issue by having a hard fail. If aux is detected to be malformed, don't attempt to continue.

daviesrob commented 8 months ago

I can easily switch that to a hard fail, but should the same also be done for other Z-type tags? At the moment, they will also presumably make broken cram files.

jkbonfield commented 8 months ago

Yes I guess it's something that is a general issue. So maybe it's a separate problem.

daviesrob commented 8 months ago

I've updated to make it fail on the bad data, and also added similar checks to the other places that handled Z and H type tags.

jkbonfield commented 8 months ago

Thanks. Seems to do the trick :)