samtools / htslib

C library for high-throughput sequencing data formats
Other
803 stars 446 forks source link

Fix two occurrences of undefined behaviour in sam.c #1794

Closed kloetzl closed 3 months ago

kloetzl commented 3 months ago

In bam_init1() we would initialize a new bam1_t using calloc(). Thereby all bytes are automatically set to zero including the pointer members such as data. However, the C standard allows for NULL to have a different bit pattern than all zeros. Hence, we need to zero the fields explicitly.

In bam_aux_first() we first have to check whether b->data is a NULL pointer before doing any arithmetic on it as that is undefined behaviour according to the C standard. (Even NULL+0 is disallowed.)

I presume that there are other occurrences of these issues in htslib. The question is do you care or do you rely on the implementation to define sane behavior where the C standard doesn't?

jmarshall commented 3 months ago

Re all-bits-zero, see POSIX. Other things being equal, I personally would historically usually adhere to this piece of Standard C pedantry; but I think you'll find there are numerous places in htslib that calloc pointers or floats.

Re bam_aux_first(): have you ever met a non-trivial bam1_t for which b->data is NULL? Any bam1_t that has been read from a file or otherwise contains data necessarily has a non-NULL b->data, so if you call bam_aux_first() on one for which this field is NULL then that's really a logic error in your program. Moreover there are many accessor routines in sam.c that similarly unguardedly call bam_get_qname()/_seq()/_qual()/_aux()/etc and would have a similar issue on a newly hatched bam1_t.

If this were to be fixed, IMHO it would be better to fix it by maintaining the invariant that b‑>data is always non-NULL. This would be done by always setting b‑>data/‑>m_data to a suitable sentinel or suitable malloced data block (but leaving b‑>l_data at 0) in bam_init1() — which would also sidestep your first concern, for this function.

kloetzl commented 3 months ago

Re all-bits-zero, see POSIX.

I had heard about the plans, great to see that this annoyance became reality. Will have to reference that in the future.

Re bam_aux_first(): have you ever met a non-trivial bam1_t for which b->data is NULL? Any bam1_t that has been read from a file or otherwise contains data necessarily has a non-NULL b->data, so if you call bam_aux_first() on one for which this field is NULL then that's really a logic error in your program.

The other day I committed a change to a large commercial code base that would try to read the qname from an empty bam1_t. My colleagues then started complaining about the hundreds of error messages 🙈 (rightfully so). That's how I even started looking into this.

Yeah, I don't know if you actually consider any of this a proper bug and the fixes are ugly. So I think defining POSIX-2024 as a prerequisite is a good solution.

jkbonfield commented 3 months ago

If we ever got a machine again where the NULL pointer isn't all-bits-zero (they exist, but AFAIK they're all ancient beasts from a bygone era) then you can expect vast swathes of tools to break. You'd need a totally different OS as I highly doubt the normal Linux base would run.

It's nice this is now officially relegated to non-POSIX platforms, and we already have some requirement for a certain level of POSIX compliance in our library usage, so it's not too bad to accept it for language usage too, although I don't think it's a good idea to explicitly add a requirement for POSIX.1-2024 as it'll contain many other things that we don't need and it will take a long while for all the compilers and libraries to become fully compliant. Relying on all-bits-zero NULL is something we've lived with for years and can continue living with IMO, especially given fixing it will likely have speed implications in some places.

kloetzl commented 3 months ago

Okay, I think we all agree that a NULL pointer should be all zero and the standard not defining it as such is unfortunate. As shown here the fix is clumsy and cumbersome. I'm going to close this as won't fix. Thanks, guys.