samtools / htslib

C library for high-throughput sequencing data formats
Other
784 stars 447 forks source link

Bug in ks_expand? #1686

Closed rmhubley closed 7 months ago

rmhubley commented 8 months ago

The ks_expand function appears to be using the length property ('l') rather than the memory_allocated property ('m') when determining how much to resize the string. This is confusing as the function declares that it is mean to increase the capacity by the expansion parameter. This has recently been added in the bgzf_getline() function. In that function the string length is initially set to 0 ('str->l = 0') before calling the ks_expand function -- in effect this is equivalent to calling ks_resize() using the value as the new allocated string size.

/// Increase kstring capacity by a given number of bytes
static inline int ks_expand(kstring_t *s, size_t expansion)
{
    size_t new_size = s->l + expansion;

    if (new_size < s->l) // Overflow check
        return -1;
    return ks_resize(s, new_size);
}
daviesrob commented 8 months ago

No, I don't think there's a bug here. The guarantee after a successful call to ks_expand() is you have the capacity to store expansion bytes after s->s + s->l.

This is a fairly common case (see for example here or here). Apart from the overflow check, ks_expand() is exactly the same as

ks_resize(s, s->l + expansion);

It needs to check against l instead of m because it doesn't want to grow the allocation unnecessarily if there's already enough space available. The actual value of m you end up with can vary, but will be greater than or equal to s->l + expansion if the function returns 0.