ledatelescope / bifrost

A stream processing framework for high-throughput applications.
BSD 3-Clause "New" or "Revised" License
64 stars 29 forks source link

autotools-based build system #157

Closed jaycedowell closed 2 years ago

jaycedowell commented 2 years ago

This PR switches Bifrost to an autotools-based build system would should solve many of the installation issues that people report, e.g., #109, #134, and #154. It also adds a .m4 and a pkg-config file to help other projects that may depend on Bifrost find an installation.

This also address a few outstanding issues with Bifrost, including:

Finally, this PR also moves away from TravisCI in favor of GitHub Actions.

codecov-commenter commented 2 years ago

Codecov Report

Merging #157 (7944d61) into master (1681fde) will increase coverage by 0.04%. The diff coverage is 54.23%.

@@            Coverage Diff             @@
##           master     #157      +/-   ##
==========================================
+ Coverage   58.46%   58.50%   +0.04%     
==========================================
  Files          65       67       +2     
  Lines        5549     5731     +182     
==========================================
+ Hits         3244     3353     +109     
- Misses       2305     2378      +73     
Impacted Files Coverage Δ
python/bifrost/blocks/quantize.py 41.93% <0.00%> (ø)
python/bifrost/blocks/transpose.py 93.54% <0.00%> (ø)
python/bifrost/blocks/unpack.py 41.93% <0.00%> (ø)
python/bifrost/psrdada.py 1.63% <0.00%> (-0.01%) :arrow_down:
python/bifrost/telemetry/__main__.py 0.00% <ø> (ø)
python/bifrost/version/__main__.py 0.00% <0.00%> (ø)
python/bifrost/sigproc2.py 45.94% <20.00%> (-7.66%) :arrow_down:
python/bifrost/ndarray.py 73.15% <30.43%> (-3.97%) :arrow_down:
python/bifrost/blocks/binary_io.py 93.44% <33.33%> (-6.56%) :arrow_down:
python/bifrost/dtype.py 25.00% <33.33%> (+0.82%) :arrow_up:
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1681fde...7944d61. Read the comment docs.

jaycedowell commented 2 years ago

@telegraphic and @league what do you think about this? Do you think it is ready to be merged in or does it need more work/testing?

league commented 2 years ago

I think the autoconf has been a success so far, and I think it could be merged.

I'm wondering though, is it your experience that the GPU-requiring tests are reliably passing? I have a VM with GeForce RTX 3090 and CUDA 11 that succeeds through configure (it auto-detects GPU arch 80) and builds, but it's reporting a few failures.

Ran 275 tests in 713.388s
FAILED (failures=4, errors=24, skipped=5)

Maybe these would fail on the master branch too, if I invoke the Makefiles with the same settings; I haven't tried that yet. Just to be concrete, here's the last of those 24 failures...

======================================================================
FAIL: test_c2r_3D (test_fft.TestFFT)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/qblocks/bifrost/test/test_fft.py", line 204, in test_c2r_3D
    self.run_test_c2r(self.shape3D, [0, 1, 2])
  File "/home/qblocks/bifrost/test/test_fft.py", line 144, in run_test_c2r
    self.run_test_c2r_impl(shape, axes, fftshift=True)
  File "/home/qblocks/bifrost/test/test_fft.py", line 136, in run_test_c2r_impl
    compare(odata.copy('system'), known_result)
  File "/home/qblocks/bifrost/test/test_fft.py", line 51, in compare
    np.testing.assert_allclose(result, gold, rtol=RTOL, atol=MTOL * absmean)
  File "/usr/local/lib/python3.6/dist-packages/numpy/testing/_private/utils.py", line 1528, in assert_allclo
se
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python3.6/dist-packages/numpy/testing/_private/utils.py", line 840, in assert_array_c
ompare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=0.1, atol=0.00163087

Mismatched elements: 2077735 / 2097152 (99.1%)
Max absolute difference: 79467.82400731
Max relative difference: 9581846.69692402
 x: ndarray([[[ 10000.328  ,  -2095.852  ,   -377.48218, ...,   6020.4707 ,
            -4811.4434 ,   6427.6562 ],
          [ -3912.9458 ,   2351.9424 ,   6352.062  , ...,  18382.254  ,...
 y: array([[[-4155.18988 , -1755.902762,  -279.50043 , ..., -2127.643956,
           772.828903,   563.93434 ],
        [ 1789.305557,  -574.083396,   525.70141 , ...,  1627.573228,...
jaycedowell commented 2 years ago

I'm wondering though, is it your experience that the GPU-requiring tests are reliably passing? I have a VM with GeForce RTX 3090 and CUDA 11 that succeeds through configure (it auto-detects GPU arch 80) and builds, but it's reporting a few failures.

Ran 275 tests in 713.388s
FAILED (failures=4, errors=24, skipped=5)

I vaguely recall running into something like this in the past with regards to the FFT tests but it wasn't consistent. Are all of your failures related to the FFT tests? Also, what do the errors look like?

jaycedowell commented 2 years ago

This could be a really small (but compelling) use case for a reproducible build step… once we can define clearly what autoconf needs to do the right thing (which maybe won't be easy), we can leave the job of regenerating ./configure to that tool. But in general I agree it's a mistake to leave configure out of the repo and push the problem onto the user with a bootstrap script.

I looked into how I build the configure file and I use autotools installed via homebrew on my laptop. I get around the various m4 problems with a cache of m4 files that I found I needed. ~Even with all of this I still end up with "possible undefined macro: AM_DEFAULT_VERBOSITY" error which pops up from the doxygen configuration step.~ That's better.

league commented 2 years ago

I don't mean to delay the autoconf merge any further, I think it could become master ASAP. But I noticed this message a lot when running ./configure:

./configure: line 7404: /usr/bin/file: No such file or directory

It seems like the absolute path to file originates with libtool? But the file command is not guaranteed to be installed, it's not part of coreutils. And even if it is installed, it may not be in /usr/bin. This error occurs on colab because their ubuntu doesn't have file installed. The same may be true on qblocks. And on NixOS it's not part of the base install either, but even when it is installed, it won't be in /usr/bin. So why doesn't libtool just get it from the PATH? If it's a real dependency, I guess we can specify it as such, but configure shouldn't hard-code its location.

I can't tell where this error has caused any harm. I guess it's trying to determine object format, and whether it requires special linker flags... and on these 64-bit gcc/gnu/linux boxes, we don't need them.

jaycedowell commented 2 years ago

Hrm, I hadn't noticed that /usr/bin/file problem. Maybe we could do something like this? https://lists.gnu.org/archive/html/bug-libtool/2014-06/msg00005.html

league commented 2 years ago

Interesting how Ludovic got some push-back on that thread! (He's a Guix developer, which is very similar to NixOS). It's funny, for those systems this is pretty easily solved (if it even matters) by a patch step that would just sed references to /usr/bin/file into the appropriate hashed version in /nix/store/zzzzz-file/bin/file.

What was more curious/alarming to me is that I see the same message on Ubuntu. It seems that file is not part of a basic install, and not even part of build-essential. So we don't have it on Google colab or a default ubuntu docker image either:

$ docker run -it --rm ubuntu:latest file --version
docker: Error response from daemon: OCI runtime create failed:
container_linux.go:380: starting container process caused: exec: "file":
executable file not found in $PATH: unknown.

But none of these platforms seem to need those linker flags libtool is detecting, so perhaps moot point. Maybe this is just something to keep an eye on. Or just recommend file as one of our prerequisites: sudo apt install exuberant-ctags file

jaycedowell commented 2 years ago

In the absence of a clear direction to go in maybe we just leave the file thing alone.