samtools / htslib

C library for high-throughput sequencing data formats
Other
784 stars 447 forks source link

configure.ac: Define _DEFAULT_SOURCE along with _XOPEN_SOURCE #1711

Closed fweimer-rh closed 6 months ago

fweimer-rh commented 7 months ago

Otherwise the configure check for mmap fails with compilers which do not support implicit function declarations because the check relies on the presence of the getpagesize function.

Related to:

jmarshall commented 7 months ago

[I'm commenting from the perspective of your pysam PR, but we might as well keep the conversation in one upstream place.]

The underlying problem is that getpagesize() was removed in POSIX Issue 6, and various platforms' system headers omit its declaration from <unistd.h> in various configurations with various ways to reinstate it. So _DEFAULT_SOURCE does the trick on glibc and possibly various BSDs, but does not on macOS for example. So this workaround is incomplete, and really the problem is in autoconf's AC_FUNC_MMAP macro. Is there a related autoconf bug report or update you can point to related to this, or perhaps an expanded version of the macro in autoconf-archive that contains a workaround?

fweimer-rh commented 7 months ago

Did you actually observe an AC_FUNC_MMAP failure on Darwin? If getpagesize is not available at all (not even for linking), the mmap check works because it uses sysconf instead.

As far as I know, the failure is specific to certain packages that change build flags in unexpected ways, such as by defining -D_XOPEN_SOURCE. The official autoconf way is to use AC_USE_SYSTEM_EXTENSIONS.

This was discussed earlier this year on the bug-autoconf list:

jmarshall commented 7 months ago

As implied by my comment, with or without your patch we observe this failure on macOS:

checking for getpagesize... yes
checking for working mmap... no    # Due to error: call to undeclared function 'getpagesize'

Just like on Linux, the “checking for getpagesize” check just checks that a getpagesize symbol can be linked against from the relevant system libraries, and does not attempt to use a proper declaration from <unistd.h> or anywhere else. So on macOS, HAVE_GETPAGESIZE is defined because such a symbol is indeed available.

This then leads to the error encountered, as the mmap test code generated by existing autoconf releases simply tries to use getpagesize() via a system header's declaration when HAVE_GETPAGESIZE is defined. On macOS, <unistd.h> guards its getpagesize() declaration as follows:

/* Removed in Issue 6 */
#if !defined(_POSIX_C_SOURCE) || _POSIX_C_SOURCE < 200112L
int      getpagesize(void) __pure2 __POSIX_C_DEPRECATED(199506L);
[…]
#endif

Thus there is no real way to resurrect macOS <unistd.h>'s getpagesize() declaration while maintaining the _XOPEN_SOURCE setting HTSlib prefers.

It seems to me that AC_USE_SYSTEM_EXTENSIONS does something mostly orthogonal to what HTSlib wants to achieve by setting _XOPEN_SOURCE. IMHO it's not really reasonable for autoconf to expect applications not to use _POSIX_C_SOURCE or _XOPEN_SOURCE to specify the POSIX APIs they wish to use.

As getpagesize() is goneburger and replaced by sysconf(_SC_PAGESIZE), IMHO autoconf's AC_FUNC_MMAP should really probe for sysconf() and _SC_PAGESIZE before getpagesize() and use the former in preference to the latter. This is effectively what the current development autoconf (2.72d) prerelease AC_FUNC_MMAP test code does, after the changes due to the mailing list discussion you pointed to.

The unchanged develop HTSlib configure.ac as processed by current git (v2.72d-1-g8013f030) autoconf produces a configure script in which this mmap check produces the expected results on macOS:

checking for getpagesize... yes
checking for working mmap... yes

I have not tested it, but I would expect that it does on Linux too. So IMHO this is an autoconf problem that will soon be fixed by an upcoming autoconf release. (In the meantime, if it wanted to be an overachiever, HTSlib could carry its own improved HTS_FUNC_MMAP in _m4/htsfoo.m4. But in reality, IIRC HTSlib doesn't use mmap() much anyway…)

jkbonfield commented 7 months ago

It looks like the only code using mmap is the mFILE shenanigans (and originating from the Staden Package, predating htslib). It's used by CRAM to handle local references, so many instances of htslib running don't require loading their own copies of the references and to provide a faster mechanism for random access. That said, it's not critical and it has ifdefs so it should work without mmap capabilities.

daviesrob commented 6 months ago

For the htslib 1.19 release, I used autoconf 2.72d which appears to have made a configure script that works well enough. Autoconf 2.72 is now officially released so I think using that is the best solution.

In future work, I may try to remove some of the calls that need _XOPEN_SOURCE. I'm not sure if it's possible to get rid of all of them, but it might make building a bit easier if we can get away with the compiler defaults.