sigp / lighthouse

Ethereum consensus client in Rust
https://lighthouse.sigmaprime.io/
Apache License 2.0
2.97k stars 769 forks source link

Error when building mdbx-sys #4280

Closed danielrachi1 closed 2 months ago

danielrachi1 commented 1 year ago

Description

I'm getting this error:

error: failed to run custom build command for `mdbx-sys v0.11.6-4 (https://github.com/sigp/libmdbx-rs?tag=v0.1.4#096da80a)`

Caused by:
  process didn't exit successfully: `/home/danielrachi/Code/lighthouse/target/release/build/mdbx-sys-057a1d896a8dceb6/build-script-build` (exit status: 101)
  --- stderr
  thread 'main' panicked at '"MDBX_version_info_struct_(unnamed_at_/home/danielrachi/_cargo/git/checkouts/libmdbx-rs-c1b523f5b64ff08c/096da80/mdbx-sys/libmdbx/mdbx_h_611_3)" is not a valid Ident', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299db9ec823/proc-macro2-1.0.56/src/fallback.rs:811:9
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

In two scenarios:

  1. When running make test.
  2. When trying to install from source.

Version

rustc 1.69.0 (84c898d65 2023-04-16) Trying to test: Lighthouse unstable branch @ b7b4549545da019b5b642ec35e29a7b2d092abc8 Trying to build: Lighthouse stable branch @ 693886b94176faa4cb450f024696cb69cda2fe58

Present Behaviour

I pulled the unstable branch and tried to run make test (in my development folder):

danielrachi@swiftx ~/C/lighthouse (unstable)> make test
cargo test --workspace --release --exclude ef_tests --exclude beacon_chain --exclude slasher
   ...
error: failed to run custom build command for `mdbx-sys v0.11.6-4 (https://github.com/sigp/libmdbx-rs?tag=v0.1.4#096da80a)`

Caused by:
  process didn't exit successfully: `/home/danielrachi/Code/lighthouse/target/release/build/mdbx-sys-5e93f4144f3d672e/build-script-build` (exit status: 101)
  --- stderr
  thread 'main' panicked at '"MDBX_version_info_struct_(unnamed_at_/home/danielrachi/_cargo/git/checkouts/libmdbx-rs-c1b523f5b64ff08c/096da80/mdbx-sys/libmdbx/mdbx_h_611_3)" is not a valid Ident', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299db9ec823/proc-macro2-1.0.55/src/fallback.rs:811:9
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
make: *** [Makefile:109: test-release] Error 101

Then, I pulled the stable branch and tried to install using make (in a folder generated by git-cloning sigp/lighthouse):

danielrachi@swiftx ~/lighthouse (stable)> make
cargo install --path lighthouse --force --locked \
        --features "jemalloc" \
        --profile "release" \

  Installing lighthouse v4.1.0 (/home/danielrachi/lighthouse/lighthouse)
    ...
error: failed to run custom build command for `mdbx-sys v0.11.6-4 (https://github.com/sigp/libmdbx-rs?tag=v0.1.4#096da80a)`

Caused by:
  process didn't exit successfully: `/home/danielrachi/lighthouse/target/release/build/mdbx-sys-49c7e9c0e0060040/build-script-build` (exit status: 101)
  --- stderr
  thread 'main' panicked at '"MDBX_version_info_struct_(unnamed_at_/home/danielrachi/_cargo/git/checkouts/libmdbx-rs-c1b523f5b64ff08c/096da80/mdbx-sys/libmdbx/mdbx_h_611_3)" is not a valid Ident', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299db9ec823/proc-macro2-1.0.56/src/fallback.rs:811:9
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
error: failed to compile `lighthouse v4.1.0 (/home/danielrachi/lighthouse/lighthouse)`, intermediate artifacts can be found at `/home/danielrachi/lighthouse/target`
make: *** [Makefile:48: install] Error 101

Expected Behaviour

I expected the tests to start running in one scenario and to have lighthouse v4.1.0 installed in the other.

michaelsproul commented 1 year ago

Yeah this is unfortunately a C compiler incompatibility. We're stuck on the current version of MDBX, but it's only used for the slasher so you can disable it with --no-default-features (which is a cargo argument). You can plumb it into make (but not make test) via CARGO_INSTALL_EXTRA_FLAGS, see: https://lighthouse-book.sigmaprime.io/installation-source.html#feature-flags.

We should probably disable it by default, as more and more people are having this issue

Which Linux distro are you on, and what does gcc --version show?

danielrachi1 commented 1 year ago

I'm using Fedora 38 (Workstation Edition)

danielrachi@swiftx ~> gcc --version
gcc (GCC) 13.1.1 20230426 (Red Hat 13.1.1-1)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
danielrachi1 commented 1 year ago

If I run: cargo test --workspace --release --exclude ef_tests --exclude beacon_chain --exclude slasher --no-default-features (The cargo command make test runs but with --no-default-features at the end.) I no longer get this error. However I now get:

error: linking with `cc` failed: exit status: 1
...
= note: /usr/bin/ld: cannot find -lpq: No such file or directory
          collect2: error: ld returned 1 exit status

error: could not compile `watch` due to previous error

Also, I don't know if this will make any of the tests fail.

michaelsproul commented 1 year ago

the libpq thing is for postgres, you need to sudo dnf install libpq-devel

the --exclude slasher flag should ensure that nothing fails

danielrachi1 commented 1 year ago

We should add libpq-devel to the list of additional requirements for developers in the book.

There are some tests using slasher logic outside of the slasher module, specifically in lighthouse/tests/beacon_node.rs

danielrachi@swiftx ~/C/lighthouse (fork_revert_logic) [101]> cargo test --workspace --release --exclude ef_tests --exclude beacon_chain --exclude slasher --no
-default-features
   Compiling lighthouse v4.1.0 (/home/danielrachi/Code/lighthouse/lighthouse)
error[E0599]: no variant or associated item named `Mdbx` found for enum `DatabaseBackend` in the current scope
    --> lighthouse/tests/beacon_node.rs:1906:74
     |
1906 |             assert_eq!(slasher_config.backend, slasher::DatabaseBackend::Mdbx);
     |                                                                          ^^^^ variant or associated item not found in `DatabaseBackend`

error[E0599]: no variant or associated item named `Mdbx` found for enum `DatabaseBackend` in the current scope
    --> lighthouse/tests/beacon_node.rs:1920:74
     |
1920 |             assert_eq!(slasher_config.backend, slasher::DatabaseBackend::Mdbx);
     |                                                                          ^^^^ variant or associated item not found in `DatabaseBackend`

For more information about this error, try `rustc --explain E0599`.
error: could not compile `lighthouse` due to 2 previous errors

This happens because they are not marked as part of the slasher feature. I added #[cfg(feature = "slasher")] on top of those tests (and others in that same file) and the error went away... But now other tests are failing and I don't know how are they related to the slasher.

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/tests.rs (target/release/deps/tests-e404371a88a45637)

running 9 tests
test short_chain ... FAILED
test chain_grows_with_metadata_and_multiple_skip_slots ... FAILED
test short_chain_with_skip_slot ... FAILED
test short_chain_with_reorg ... FAILED
test large_chain ... FAILED
test chain_grows_to_second_epoch ... FAILED
test short_chain_sync_starts_on_skip_slot ... FAILED
test chain_grows ... FAILED
test chain_grows_with_metadata ... FAILED

failures:

---- short_chain stdout ----
thread 'short_chain' panicked at 'failed to start container', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299db9ec823/testcontainers-0.14.0/src/clie

---- chain_grows_with_metadata_and_multiple_skip_slots stdout ----
thread 'chain_grows_with_metadata_and_multiple_skip_slots' panicked at 'failed to start container', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299d
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- short_chain_with_skip_slot stdout ----
thread 'short_chain_with_skip_slot' panicked at 'failed to start container', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299db9ec823/testcontainers-

---- short_chain_with_reorg stdout ----
thread 'short_chain_with_reorg' panicked at 'failed to start container', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299db9ec823/testcontainers-0.14

---- large_chain stdout ----
thread 'large_chain' panicked at 'failed to start container', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299db9ec823/testcontainers-0.14.0/src/clients/cli.rs:48:9

---- chain_grows_to_second_epoch stdout ----
thread 'chain_grows_to_second_epoch' panicked at 'failed to start container', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299db9ec823/testcontainers-0.14.0/src/clients/cli.rs:48:9

---- short_chain_sync_starts_on_skip_slot stdout ----
thread 'short_chain_sync_starts_on_skip_slot' panicked at 'failed to start container', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299db9ec823/testcontainers-0.14.0/src/clients/cli.rs:48:9

---- chain_grows stdout ----
thread 'chain_grows' panicked at 'failed to start container', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299db9ec823/testcontainers-0.14.0/src/clients/cli.rs:48:9

---- chain_grows_with_metadata stdout ----
thread 'chain_grows_with_metadata' panicked at 'failed to start container', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299db9ec823/testcontainers-0.14.0/src/clients/cli.rs:48:9
danielrachi1 commented 1 year ago

I think the real problem is that we don't have a way to share a reliable development environment. Docker could be used for this but there are some details I don't like about it for this purpose. I've heard Nix is a great tool for this. I'll give it a try and see if I can come up with something useful.

michaelsproul commented 1 year ago

We should add libpq-devel to the list of additional requirements for developers in the book.

That's a great idea. Would you mind opening a PR?

This, and the other issues are fallout from merging a rather major new component, the beacon.watch chain indexer: https://github.com/sigp/lighthouse/pull/3362. Usually there are some quirks in docs and tests after merging such a large feature, even if we make every effort to avoid them.

---- chain_grows_with_metadata_and_multiple_skip_slots stdout ---- thread 'chain_grows_with_metadata_and_multiple_skip_slots' panicked at 'failed to start container', /home/danielrachi/.cargo/registry/src/github.com-1ecc6299d note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

This is also beacon.watch related. Do you have Docker/Podman installed? I think the new watch tests use Docker to spawn Postgres in a container. Or if you're already running in Docker, it might be a Docker-in-Docker sort of bug.

I think the real problem is that we don't have a way to share a reliable development environment.

We kind of do have a standard environment: it's the Github runner image used by CI. It would be possible to use this locally, but as you say, but probably not particularly fun (I would also prefer not to use Docker for my dev environment).

I've heard Nix is a great tool for this. I'll give it a try and see if I can come up with something useful.

I'd be wary of going too deep on this, just because for it to be effective it would need to be adopted as our primary CI. If we added a Nix env that wasn't tested on CI it would be prone to bitrot. To keep CI fast, it would probably mean our only CI run would have to use Nix. As far as I know none of the main Lighthouse devs use Nix at all, so it's also an issue of familiarity (I've used it very briefly several years ago).

Related to this we also have some in-progress work to further Docker-ify CI so that it runs on a bare metal machine owned by SigP: https://github.com/sigp/lighthouse/pull/4115. Those images could potentially be used for standardised local dev environments by people who are interested.

In summary I think we should:

danielrachi1 commented 1 year ago

Turns out you need docker installed and running. Added that to the PR.

I opened a second PR adding the slasher feature flag to the slasher tests I mentioned in a previous comment.

With those changes made I successfully compiled and passed all tests using:

cargo test --workspace --release --exclude ef_tests --exclude beacon_chain --exclude slasher --no-default-features
zhiqiangxu commented 1 year ago

How to install this libpq-devel on mac osx? I tried brew install libpq-dev and brew install libpq-devel, both reports Warning: No available formula with the name "xxx".

michaelsproul commented 1 year ago

@zhiqiangxu I think it's just libpq on homebrew: https://formulae.brew.sh/formula/libpq

zhiqiangxu commented 1 year ago

@michaelsproul brew install libpq runs successfully, but still reports library not found for -lpq.

UPDATE

This is what I got after brew install libpq:

image
eenagy commented 5 months ago

Same issue occurs: the build fails on the latest Ubuntu 24.04, but works correctly on the latest Debian bookworm.

The exact same build script is used in both cases, with the only difference being the distribution. Other users have also reported this issue.

michaelsproul commented 5 months ago

@eenagy Thanks for the info about Ubuntu 24.04. We are deprecating the MDBX backend in the slasher, so nobody is working on keeping it up to date.

If you would like to see the MDBX backend maintained we would consider a PR to switch over to the Reth team's bindings: https://github.com/paradigmxyz/reth/tree/main/crates/storage/libmdbx-rs

michaelsproul commented 5 months ago

@eenagy I've just looked at your project and realised you're maintaining packages for Debian & Ubuntu! Thanks for doing that. I think it would be reasonable for your tools to turn off the slasher-mdbx feature to avoid the breakage.

eenagy commented 5 months ago

@eenagy I've just looked at your project and realised you're maintaining packages for Debian & Ubuntu! Thanks for doing that. I think it would be reasonable for your tools to turn off the slasher-mdbx feature to avoid the breakage.

All right, that's sounds good. I will note this, when I release the next version or patch.

varun-doshi commented 2 months ago

@michaelsproul brew install libpq runs successfully, but still reports library not found for -lpq.

UPDATE

This is what I got after brew install libpq:

image

Were you able to solve this?

michaelsproul commented 2 months ago

@varun-doshi unless you need watch you don't need libpq. We should probably prevent it from being built in the homebrew formula.

If you build Lighthouse from source using make, you won't need it.

varun-doshi commented 2 months ago

I'm trying to run tests using cargo test with logging features enabled. Can you please tell me how do I disable watch?

michaelsproul commented 2 months ago

cargo test --exclude watch in that case, plus whatever other args you want (--features, --release, etc)

michaelsproul commented 2 months ago

I'm going to close this issue as it was solved by:

If you continue having issues with libpq, please open a new issue @varun-doshi and we can discuss there.