jupiter / parquet2json

A command-line tool for converting Parquet to newline-delimited JSON
MIT License
29 stars 1 forks source link

`cargo install parquet2json` failure on Ubuntu #19

Closed ryan-williams closed 1 year ago

ryan-williams commented 1 year ago
cargo install parquet2json ``` Updating crates.io index Installing parquet2json v2.0.2 Updating crates.io index Compiling libc v0.2.149 Compiling proc-macro2 v1.0.69 Compiling unicode-ident v1.0.12 Compiling autocfg v1.1.0 Compiling cfg-if v1.0.0 Compiling version_check v0.9.4 Compiling once_cell v1.18.0 Compiling pin-project-lite v0.2.13 Compiling memchr v2.6.4 Compiling bytes v1.5.0 Compiling futures-core v0.3.28 Compiling pkg-config v0.3.27 Compiling itoa v1.0.9 Compiling num-traits v0.2.17 Compiling slab v0.4.9 Compiling futures-sink v0.3.28 Compiling futures-channel v0.3.28 Compiling futures-task v0.3.28 Compiling indexmap v1.9.3 Compiling futures-util v0.3.28 Compiling quote v1.0.33 Compiling syn v2.0.38 Compiling hashbrown v0.12.3 Compiling log v0.4.20 Compiling serde v1.0.189 Compiling pin-utils v0.1.0 Compiling futures-io v0.3.28 Compiling typenum v1.17.0 Compiling generic-array v0.14.7 Compiling jobserver v0.1.27 Compiling num_cpus v1.16.0 Compiling mio v0.8.8 Compiling cc v1.0.83 Compiling signal-hook-registry v1.4.1 Compiling socket2 v0.5.5 Compiling getrandom v0.2.10 Compiling tracing-core v0.1.32 Compiling fnv v1.0.7 Compiling http v0.2.9 Compiling openssl-src v300.1.5+3.1.3 Compiling ryu v1.0.15 Compiling httparse v1.8.0 Compiling tracing v0.1.40 Compiling vcpkg v0.2.15 Compiling ring v0.16.20 Compiling syn v1.0.109 Compiling try-lock v0.2.4 Compiling http-body v0.4.5 Compiling want v0.3.1 Compiling ring v0.17.5 Compiling socket2 v0.4.10 Compiling openssl-sys v0.9.93 Compiling num-integer v0.1.45 Compiling tower-service v0.3.2 Compiling httpdate v1.0.3 Compiling serde_json v1.0.107 Compiling static_assertions v1.1.0 Compiling openssl-probe v0.1.5 Compiling lexical-util v0.8.5 Compiling spin v0.9.8 Compiling untrusted v0.9.0 Compiling untrusted v0.7.1 Compiling iana-time-zone v0.1.58 Compiling semver v1.0.20 Compiling percent-encoding v2.3.0 Compiling spin v0.5.2 Compiling digest v0.9.0 Compiling zstd-sys v2.0.9+zstd.1.5.5 Compiling num-bigint v0.4.4 Compiling proc-macro-error-attr v1.0.4 Compiling base64 v0.21.4 Compiling openssl v0.10.57 Compiling foreign-types-shared v0.1.1 Compiling rustls v0.20.9 Compiling crc32fast v1.3.2 Compiling async-trait v0.1.74 Compiling foreign-types v0.3.2 Compiling tokio-macros v2.1.0 Compiling futures-macro v0.3.28 Compiling serde_derive v1.0.189 Compiling openssl-macros v0.1.1 Compiling block-buffer v0.9.0 Compiling dirs-sys-next v0.1.2 Compiling num-iter v0.1.43 Compiling tokio v1.33.0 Compiling num-rational v0.4.1 Compiling proc-macro-error v1.0.4 Compiling native-tls v0.2.11 Compiling thiserror v1.0.50 Compiling subtle v2.4.1 Compiling hex v0.4.3 Compiling bitflags v2.4.1 Compiling opaque-debug v0.3.0 Compiling tinyvec_macros v0.1.1 Compiling crypto-mac v0.11.1 Compiling tinyvec v1.6.0 Compiling dirs-next v2.0.0 Compiling rustc_version v0.4.0 Compiling futures-executor v0.3.28 Compiling sct v0.7.0 Compiling futures v0.3.28 Compiling thiserror-impl v1.0.50 Compiling rustls-pemfile v1.0.3 Compiling tokio-util v0.7.9 Compiling lexical-write-integer v0.8.5 Compiling h2 v0.3.21 Compiling lexical-parse-integer v0.8.6 Compiling lz4-sys v1.9.4 Compiling aho-corasick v1.1.2 Compiling ahash v0.7.6 Compiling bitflags v1.3.2 Compiling zstd-safe v5.0.2+zstd.1.5.2 Compiling chrono v0.4.31 Compiling base64 v0.13.1 Compiling lazy_static v1.4.0 Compiling shlex v1.2.0 Compiling regex-syntax v0.8.2 Compiling alloc-no-stdlib v2.0.4 Compiling cpufeatures v0.2.10 Compiling zeroize v1.6.0 Compiling sha2 v0.9.9 Compiling hyper v0.14.27 Compiling regex-automata v0.4.3 Compiling alloc-stdlib v0.2.2 error: failed to run custom build command for `openssl-sys v0.9.93` note: To improve backtraces for build dependencies, set the CARGO_PROFILE_RELEASE_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation. Caused by: process didn't exit successfully: `/tmp/cargo-install2gw9cL/release/build/openssl-sys-39eb57886be91861/build-script-main` (exit status: 101) --- stdout cargo:rerun-if-env-changed=X86_64_UNKNOWN_LINUX_GNU_OPENSSL_NO_VENDOR X86_64_UNKNOWN_LINUX_GNU_OPENSSL_NO_VENDOR unset cargo:rerun-if-env-changed=OPENSSL_NO_VENDOR OPENSSL_NO_VENDOR unset cargo:rerun-if-env-changed=CC_x86_64-unknown-linux-gnu CC_x86_64-unknown-linux-gnu = None cargo:rerun-if-env-changed=CC_x86_64_unknown_linux_gnu CC_x86_64_unknown_linux_gnu = None cargo:rerun-if-env-changed=HOST_CC HOST_CC = None cargo:rerun-if-env-changed=CC CC = None cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS CRATE_CC_NO_DEFAULTS = None DEBUG = Some("false") CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2") cargo:rerun-if-env-changed=CFLAGS_x86_64-unknown-linux-gnu CFLAGS_x86_64-unknown-linux-gnu = None cargo:rerun-if-env-changed=CFLAGS_x86_64_unknown_linux_gnu CFLAGS_x86_64_unknown_linux_gnu = None cargo:rerun-if-env-changed=HOST_CFLAGS HOST_CFLAGS = None cargo:rerun-if-env-changed=CFLAGS CFLAGS = None cargo:rerun-if-env-changed=AR_x86_64-unknown-linux-gnu AR_x86_64-unknown-linux-gnu = None cargo:rerun-if-env-changed=AR_x86_64_unknown_linux_gnu AR_x86_64_unknown_linux_gnu = None cargo:rerun-if-env-changed=HOST_AR HOST_AR = None cargo:rerun-if-env-changed=AR AR = None cargo:rerun-if-env-changed=ARFLAGS_x86_64-unknown-linux-gnu ARFLAGS_x86_64-unknown-linux-gnu = None cargo:rerun-if-env-changed=ARFLAGS_x86_64_unknown_linux_gnu ARFLAGS_x86_64_unknown_linux_gnu = None cargo:rerun-if-env-changed=HOST_ARFLAGS HOST_ARFLAGS = None cargo:rerun-if-env-changed=ARFLAGS ARFLAGS = None cargo:rerun-if-env-changed=RANLIB_x86_64-unknown-linux-gnu RANLIB_x86_64-unknown-linux-gnu = None cargo:rerun-if-env-changed=RANLIB_x86_64_unknown_linux_gnu RANLIB_x86_64_unknown_linux_gnu = None cargo:rerun-if-env-changed=HOST_RANLIB HOST_RANLIB = None cargo:rerun-if-env-changed=RANLIB RANLIB = None cargo:rerun-if-env-changed=RANLIBFLAGS_x86_64-unknown-linux-gnu RANLIBFLAGS_x86_64-unknown-linux-gnu = None cargo:rerun-if-env-changed=RANLIBFLAGS_x86_64_unknown_linux_gnu RANLIBFLAGS_x86_64_unknown_linux_gnu = None cargo:rerun-if-env-changed=HOST_RANLIBFLAGS HOST_RANLIBFLAGS = None cargo:rerun-if-env-changed=RANLIBFLAGS RANLIBFLAGS = None running cd "/tmp/cargo-install2gw9cL/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/build/src" && AR="ar" CC="cc" RANLIB="ranlib" "perl" "./Configure" "--prefix=/tmp/cargo-install2gw9cL/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/install" "--openssldir=/usr/local/ssl" "no-dso" "no-shared" "no-ssl3" "no-tests" "no-comp" "no-zlib" "no-zlib-dynamic" "--libdir=lib" "no-md2" "no-rc5" "no-weak-ssl-ciphers" "no-camellia" "no-idea" "no-seed" "linux-x86_64" "-O2" "-ffunction-sections" "-fdata-sections" "-fPIC" "-m64" Configuring OpenSSL version 3.1.3 for target linux-x86_64 Using os-specific seed configuration Created configdata.pm Running configdata.pm Created Makefile.in Created Makefile Created include/openssl/configuration.h ********************************************************************** *** *** *** OpenSSL has been successfully configured *** *** *** *** If you encounter a problem while building, please open an *** *** issue on GitHub *** *** and include the output from the following command: *** *** *** *** perl configdata.pm --dump *** *** *** *** (If you are new to OpenSSL, you might want to consult the *** *** 'Troubleshooting' section in the INSTALL.md file first) *** *** *** ********************************************************************** running cd "/tmp/cargo-install2gw9cL/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/build/src" && "make" "depend" --- stderr thread 'main' panicked at ' Error building OpenSSL dependencies: Command: cd "/tmp/cargo-install2gw9cL/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/build/src" && "make" "depend" Failed to execute: No such file or directory (os error 2) ', /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/openssl-src-300.1.5+3.1.3/src/lib.rs:577:9 stack backtrace: 0: rust_begin_unwind at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5 1: core::panicking::panic_fmt at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14 2: openssl_src::Build::run_command 3: openssl_src::Build::build 4: build_script_main::find_vendored::get_openssl 5: build_script_main::find_openssl 6: build_script_main::main 7: core::ops::function::FnOnce::call_once note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. warning: build failed, waiting for other jobs to finish... error: failed to compile `parquet2json v2.0.2`, intermediate artifacts can be found at `/tmp/cargo-install2gw9cL`. To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path. ```

Seems that something is failing while building the openssl-sys create.

Update: I was missing make. sudo apt-get install -y make let it get further, though it's still failing. See below.

This is on an EC2 instance, AMI ami-007855ac798b5175e, "ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230325". Will keep digging… maybe I did something previously on this VM that is causing an issue, but it's not clear to me from the output what the issue could be.

rustup show ``` Default host: x86_64-unknown-linux-gnu rustup home: /home/ubuntu/.rustup installed targets for active toolchain -------------------------------------- wasm32-unknown-unknown x86_64-unknown-linux-gnu active toolchain ---------------- stable-x86_64-unknown-linux-gnu (default) rustc 1.72.0 (5680fa18f 2023-08-23) ```
uname -a
# Linux ip-172-31-51-13 6.2.0-1013-aws #13~22.04.1-Ubuntu SMP Fri Sep  8 17:29:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
ryan-williams commented 1 year ago

Missing make: sudo apt-get install -y make

I was missing the make binary! sudo apt-get install -y make let it get a bit further.

In retrospect, this bit of the error output was telling me most of what I needed to know:

  Error building OpenSSL dependencies:
      Command: cd "/tmp/cargo-install2gw9cL/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/build/src" && "make" "depend"
      Failed to execute: No such file or directory (os error 2)

Next error: rm: Permission denied while building openssl-sys

Now it seems to be failing due to a rm: Permission denied:

  make[1]: Leaving directory '/tmp/cargo-installIrpFGj/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/build/src'

  --- stderr
  make[1]: rm: Permission denied
  make[1]: *** [Makefile:11811: providers/liblegacy.a] Error 127
  make[1]: *** Waiting for unfinished jobs....
  make: *** [Makefile:2138: build_libs] Error 2
  thread 'main' panicked at '

  Error building OpenSSL:
      Command: cd "/tmp/cargo-installIrpFGj/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/build/src" && MAKEFLAGS="-j --jobserver-fds=8,9 --jobserver-auth=8,9" "make" "build_libs"
      Exit status: exit status: 2

When I mimic the commands there, I see:

cd /tmp/cargo-installIrpFGj/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/build/src
MAKEFLAGS="-j --jobserver-fds=8,9 --jobserver-auth=8,9" "make" "build_libs"
# make: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
# make depend && make _build_libs
# make[1]: Entering directory '/tmp/cargo-installIrpFGj/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/build/src'
# make[1]: Leaving directory '/tmp/cargo-installIrpFGj/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/build/src'
# make[1]: Entering directory '/tmp/cargo-installIrpFGj/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/build/src'
# rm -f apps/libapps.a
# make[1]: rm: Permission denied
# make[1]: *** [Makefile:3123: apps/libapps.a] Error 127
# make[1]: Leaving directory '/tmp/cargo-installIrpFGj/release/build/openssl-sys-fffc2960d0f30f6b/out/openssl-build/build/src'
# make: *** [Makefile:2138: build_libs] Error 2

It seems like rm -f apps/libapps.a is failing, though running it directly no-ops/succeeds (apps/libapps.a already doesn't exist).

Makefile:3123 contains:

$(RM) apps/libapps.a

strange…

ryan-williams commented 1 year ago

root also hits rm: Permission denied

sudo cargo install parquet2json is also failing to build openssl-sys, with a rm: Permission denied error:

  make[1]: Leaving directory '/tmp/cargo-installNXj4JZ/release/build/openssl-sys-10d0ce23e27c55a6/out/openssl-build/build/src'

  --- stderr
  make[1]: rm: Permission denied
  make[1]: *** [Makefile:11816: providers/liblegacy.a] Error 127
  make[1]: *** Waiting for unfinished jobs....
  make: *** [Makefile:2143: build_libs] Error 2
  thread 'main' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/openssl-src-300.1.5+3.1.3/src/lib.rs:577:9:

  Error building OpenSSL:
      Command: cd "/tmp/cargo-installNXj4JZ/release/build/openssl-sys-10d0ce23e27c55a6/out/openssl-build/build/src" && MAKEFLAGS="-j --jobserver-fds=8,9 --jobserver-auth=8,9" "make" "build_libs"
      Exit status: exit status: 2

Simple Dockerfile works

On the other hand, building this Dockerfile works for me:

FROM ubuntu
ENV PATH="/root/.cargo/bin:${PATH}"
RUN apt-get update \
 && apt-get install -y curl gcc perl make \
 && curl https://sh.rustup.rs -sSf | bash -s -- -y \
 && cargo install parquet2json \
 && parquet2json --version

so I guess it is something wrong with my environment, but I have no idea what it could be.

(perl seems necessary due to https://github.com/openssl/openssl/issues/13761)

ryan-williams commented 1 year ago

Attempting to build/install from source:

git clone https://github.com/jupiter/parquet2json
cd parquet2json

# ✅ OK, includes an output line `Compiling openssl-sys v0.9.87`
cargo check

# ✅ Also OK, also says `Compiling openssl-sys v0.9.87`
cargo build

# ❌ fails, same `rm: Permission denied` error as above,
# Also: "error: failed to run custom build command for `openssl-sys v0.9.93`"
cargo install --path .  

Full output is >1300 lines, here it is in a gist.

Not sure why the openssl-sys versions are different between {check,build} vs. install, but that matches what happens in the working Docker build above, so I don't think that's the issue.

Also can't cargo build rust-openssl

Trying to cargo build in a clean rust-openssl clone gives errors similar to this SO, where this comment says:

Most likely there is some library version confusion. It looks like you are using a custom version of libssl, rather than the default one that comes with Ubuntu. libssl itself depends on libcrypto - and the two normally get compiled at the same time as a pair. If the version of libssl you are using picks up at runtime some other version of libcrypto (such as the system Ubuntu version) you can end up with problems like this.

Not sure if that applies here, I have done apt-get install --reinstall libssl-dev openssl, not sure how else to narrow it down.

ryan-williams commented 1 year ago

I'm able to cargo install parquet2json on a clean EC2 VM (AMI ami-0fc5d935ebf8bc3bc, "ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230919"):

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | bash -s -- -y
. .bashrc
sudo apt-get update -y
sudo apt-get install -y gcc make perl
cargo install parquet2json
parquet2json --version
# parquet2json 2.0.2

I guess some state got messed up on my previous VM, no idea what might have happened. I'll just move to a new VM, if I can repro it again I'll follow up here, closing for now.

jupiter commented 1 year ago

Thanks for sharing. Might be useful to someone in future.

I initially tried to not have an openssl dependency and we should maybe look into Rust-only alternatives again.

ryan-williams commented 1 year ago

I debugged it in ryan-williams/parquet2json-install-error:

My dotfiles put a directory named rm on $PATH, which can lead to an error like this (while building the openssl-sys dependency):

cargo install parquet2json
# …
#   --- stderr
#   make[1]: rm: Permission denied
#   make[1]: *** [Makefile:11815: providers/liblegacy.a] Error 127
#   make[1]: *** Waiting for unfinished jobs....
#   make: *** [Makefile:2142: build_libs] Error 2
#   thread 'main' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/openssl-src-300.1.5+3.1.3/src/lib.rs:577:9:

Perplexingly, it even says rm: Permission denied when run as root. I guess this is what happens in some circumstances when trying to "execute" a directory. However, wherever I've tested it, I see an Is a directory error instead of Permission denied 🤷‍♂️.

Here's a Dockerfile that repros the issue:

FROM ubuntu

# Install Rust (and a few apt packages needed to build parquet2json and its dependencies)
ENV PATH="/root/.cargo/bin:${PATH}"
RUN apt-get update \
 && apt-get install -y curl gcc perl make \
 && curl https://sh.rustup.rs -sSf | bash -s -- -y
WORKDIR /root

# Putting a directory with basename `rm` on `$PATH` (by its absolute path) breaks `openssl-sys`
# build inside `cargo install parquet2json` below
ENV dir=/root/a/b/c
RUN mkdir -p $dir/rm
ENV PATH="$dir:${PATH}"
RUN cargo install parquet2json

Here's a GHA workflow:

name: Repro "cargo install parquet2json" failure
on:
  push:
jobs:
  repro:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: |
          # Putting a directory with basename `rm` on `$PATH` breaks `openssl-sys` build inside `cargo install parquet2json` below 
          dir=$PWD/a/b/c
          mkdir -p $dir/rm
          export PATH="$dir:$PATH"
          cargo install parquet2json

and example (failing) run.