Closed larribas closed 3 years ago
Could you include all steps to reproduce the bug? e.g., Dockerfile, command line, download files.
I can reproduce it in docker now.
@pherl, I can reproduce the bug with the gem downloaded from ruby gem repository. However, I cannot reproduce the bug with the gem locally built on alpine3.7 docker container. Is cross-compiling related to the issue?
@larribas Can you try install locally built protobuf gem? If you have difficulty building it, I can show you.
Just tested it. It does look like a cross-compilation bug. Here's what I did:
Within docker run -it --rm --entrypoint=sh ruby:2.5-alpine
apk add --update git gcc make cmake libc-dev linux-headers bash wget libc6-compat autoconf automake libtool curl make g++ unzip
# SEGFAULT
gem install google-protobuf
ruby -e 'require "google/protobuf/any_pb"'
# GOOD
git submodule update --init --recursive
./autogen.sh
./configure
make
make check
make install
ldconfig
cd ruby
gem install bundler
bundle
rake
rake clobber_package gem
gem install `ls pkg/google-protobuf-*.gem`
ruby -e 'require "google/protobuf/any_pb"'
Let me know if I can help with anything else
I am having a very similar issue: ruby 2.4.4 alpine 3.7, using google-cloud-pubsub gem https://github.com/google/protobuf/issues/4728
I tested 3.6.x branch. Unfortunately, the linux-x86_64 gem still doesn't work on alpine. The workaround is to install a native gem which is built on target platform:
apk add --update git gcc make cmake libc-dev linux-headers bash wget libc6-compat autoconf automake libtool curl make g++ unzip
gem install google-protobuf --version=3.5.1.2 --platform=ruby
Feel free to reopen this issue if this workaround is unacceptable.
Hello, thanks for looking into this.
Your workaround is not working for us. Same segfault error. Here is our dockerfile:
FROM ruby:2.4.4-alpine3.7
ENV RAILS_ROOT /rails/prism
ENV PAGER /usr/bin/less
ENV TERM xterm
RUN mkdir -p $RAILS_ROOT/tmp/pids
WORKDIR $RAILS_ROOT
ADD Gemfile* ./
RUN apk --update --no-cache add \
autoconf \
automake \
bash \
cmake \
curl \
curl-dev \
dbus \
fontconfig \
g++ \
gcc \
git \
libc-dev \
libc6-compat \
libcurl \
libtool \
libxml2-dev \
libxslt-dev \
linux-headers \
make \
mysql-client \
mysql-dev \
nodejs \
pdftk \
poppler-utils \
qt5-qtbase-dev \
ttf-freefont \
unzip \
wget \
&& gem install google-protobuf --version=3.5.1.2 --platform=ruby \
&& apk --update add --virtual build-dependencies build-base build-dependencies gcc git less \
&& bundle install --jobs `expr $(cat /proc/cpuinfo | grep -c "cpu cores") - 1` --retry 3 --without test development\
&& apk del build-dependencies
COPY . $RAILS_ROOT
EXPOSE 3000
CMD ["bin/rails", "server", "-b", "0.0.0.0"]
Were you suggesting we try alpine 3.6.x? Anything else you can think of trying?
@TeBoring
Did you try ruby -e 'require "google/protobuf/any_pb"'
after you install the gem?
Just tried it, same error :(
FROM ruby:2.4.4-alpine3.7
ENV RAILS_ROOT /rails/prism
ENV PAGER /usr/bin/less
ENV TERM xterm
RUN mkdir -p $RAILS_ROOT/tmp/pids
WORKDIR $RAILS_ROOT
ADD Gemfile* ./
RUN apk --update --no-cache add \
autoconf \
automake \
bash \
cmake \
curl \
curl-dev \
dbus \
fontconfig \
g++ \
gcc \
git \
libc-dev \
libc6-compat \
libcurl \
libtool \
libxml2-dev \
libxslt-dev \
linux-headers \
make \
mysql-client \
mysql-dev \
nodejs \
pdftk \
poppler-utils \
qt5-qtbase-dev \
ttf-freefont \
unzip \
wget \
&& gem install google-protobuf --version=3.5.1.2 --platform=ruby \
&& ruby -e 'require "google/protobuf/any_pb"' \
&& apk --update add --virtual build-dependencies build-base build-dependencies gcc git less \
&& bundle install --jobs `expr $(cat /proc/cpuinfo | grep -c "cpu cores") - 1` --retry 3 --without test development\
&& apk del build-dependencies
COPY . $RAILS_ROOT
EXPOSE 3000
CMD ["bin/rails", "server", "-b", "0.0.0.0"]
The segmentation happens after the docker file is executed or at the place of calling ruby -e 'require "google/protobuf/any_pb"'
?
The segmentation occurs after the build, when I try to run the rails app itself (instantiate the class that requires google-cloud-pubsub).
Thanks again for all your timely help!
@TeBoring - I've recreated our bug in minimal form, including an attempt to follow the workaround @larribas demonstrated.
https://github.com/BenefitsDataTrust/tmp-protobuf-segfault
To replicate, just clone, drop a creds.json file in the repo, and docker build .
(I added your workaround suggestion in the ./teboring subdir if you want to check that out as well.)
@jeffdeville, thanks, I'll take a look.
@teboring any updates on this issue?
Hi guys, Any workaround for this issue?
Hi guys, facing the same issue. Any fix/workaround available? @TeBoring
I'm experiencing the same issue using ruby:2.4.4-alpine3.7
. Has anyone found a workaround for this?
As work-a-round, you could set bundle config option to ignore the current machine's platform and install only ruby platform gems: export BUNDLE_FORCE_RUBY_PLATFORM=1
So Dockerfile would look like:
FROM ruby:2.5.1-alpine
RUN apk add --update build-base
ADD . .
RUN BUNDLE_FORCE_RUBY_PLATFORM=1 bundle install --jobs $(nproc)
CMD ./start.rb
Thank you, @kam1kaze
I was trying to build a Dockerfile from ruby:2.5.1-alpine
with grpc
gem and that (your suggest) just worked fine!
Many thanks @kam1kaze your workaround works for me. I'm using ruby:2.5.3-alpine3.8.
We are currently using rake-compiler-docker to cross compile ruby gems: https://github.com/rake-compiler/rake-compiler-dock/blob/v0.6.2/Dockerfile However, the gem built by that docker image doesn't work on alpine. Previously, I doubt whether it's the problem of ruby 2.5.0 (protobuf doesn't work on ruby 2.5.0 for ruby's internal bug). I tried to build a docker image for ruby 2.5.1 and build gem on that. But same error. I don't have a way to go now. If any one knows how to build a gem for x86_64 that also work on alpine, please let me know.
https://github.com/docker-library/ruby/issues/196 Maybe this is related?
FWIW I encountered this on ruby:2.6.1-stretch as well.
Here's another hack that worked for me on ruby:2.5.1-alpine. It should be faster as you don't specify BUNDLE_FORCE_RUBY_PLATFORM for entire Gemfile.
gem "google-protobuf", '3.7.0'
bundle install
remove the installed version and re-install it manually like this:
RUN bundle install --jobs $(nproc) --retry 2
RUN gem uninstall -I google-protobuf
RUN gem install google-protobuf --version=3.7.0 --platform=ruby
For me, i have to first add the gcompat
package to get the segfault. It is segfaulting with or without BUNDLE_FORCE_RUBY_PLATFORM=1
.
I have exactly the same issue @quinn . Did you find any workaround ?
@shideneyu I never did and had to switch from alpine to Ubuntu for the base image. Image size went from 50MB to 850MB, or something like that lol
If someone has this working can they post a Dockerfile with a Gemfile that works??
it seems like people in this thread have gotten this to work, and the great thing about deterministic tools like bundler and docker is that it makes it incredibly easy to reproduce other's results. Can someone share their working:
🙏
This does work for me on 2.6.3-alpine
.
I have this in Gemfile:
gem "grpc", "1.21.0", platforms: ["ruby"]
gem "google-protobuf", "3.8.0", platforms: ["ruby"]
And this in Dockerfile:
RUN CFLAGS="-Wno-cast-function-type" \
BUNDLE_FORCE_RUBY_PLATFORM=1 \
bundle install
Same issue with ruby:2.6.5-alpine3.10
Tried: CFLAGS="-Wno-cast-function-type" BUNDLE_FORCE_RUBY_PLATFORM=1 bundle install --jobs 20 --retry 5 --path /bundle
into Docker, no luck(
here is an example: https://github.com/OpakAlex/reproduce-google-protobuf-gem-issue for reproduce bug
Hi @OpakAlex have you found a solution in the meantime? Thanks
In my case the error occurs with the following Dockerfile:
FROM ruby:2.5.3-alpine
RUN gem install google-protobuf --version=3.8.0
CMD ruby -e 'require "google/protobuf/any_pb"'
whereas with the following it doesn't occur:
FROM ruby:2.5.3-alpine
RUN gem install google-protobuf --version=3.11.4
CMD ruby -e 'require "google/protobuf/any_pb"'
Unfortunately, ENV BUNDLE_FORCE_RUBY_PLATFORM 1
does not help in my case.
[edit]
I'm sorry, ENV BUNDLE_FORCE_RUBY_PLATFORM 1
is for bundler. Apparently not work for gem install
Anyway ENV BUNDLE_FORCE_RUBY_PLATFORM 1
worked for me. Thanks
Try gcompat
instead of libc6-compat
. It works for me on ruby:2.6.3-alpine
and google-protobuf (3.12.1)
https://pkgs.alpinelinux.org/contents?file=ld-linux-x86-64.so.2&path=&name=&branch=edge
I am facing the same issue, segmentation fault. I've tried @nhattan's solution with using gcompat
but it didn't work for me. I'm on Ruby 2.4.10 Alpine image.
I can confirm that @nhattan's solution worked for us on ruby:2.7.1-alpine3.11
, protobuf gem version 3.12.2
. Thank you for the suggestion @nhattan!
I've just tried it again with ruby:2.5.8-alpine3.11
and google-protobuf (3.12.2)
. Unfortunately I am still seeing the same crashes as before any time the Stackdriver gem wants to report an exception using Cloud Logging.
Apologies @nhattan, @XeeD, I haven't tested this sufficiently. It got rid of one segfault, but unfortunately another one still follows.
I can confirm that @nhattan's solution worked for us on
ruby:2.7.1-alpine3.11
, protobuf gem version3.12.2
. Thank you for the suggestion @nhattan!
ruby:2.5.6-alpine3.10
with apk add --no-cache gcompat
also works fine.
@rnnds I'm not sure if you've seen my follow up, but it ended up actually not working for us. Just to confirm this is without BUNDLE_FORCE_RUBY_PLATFORM
set correct? Would it be possible for you to share your Dockerfile with us?
@rnnds I'm not sure if you've seen my follow up, but it ended up actually not working for us. Just to confirm this is without
BUNDLE_FORCE_RUBY_PLATFORM
set correct? Would it be possible for you to share your Dockerfile with us?
@DawidJanczak sorry, you are right I didn't notice your message and I also got an error. I definitively solve it using a different image:
FROM ruby:2.6-slim-stretch
RUN apt-get update && apt-get install -y \
build-essential \
libpq-dev
RUN gem install bundler
RUN bundle install
#...
This is not going to be a problem-solving comment, so if you're looking for solutions, please disregard. If you're disgruntled like me, maybe it'll confirm your own biases. Regardless, I'm putting it here in the hopes that someone reads it and maybe something changes.
TLDR: A protocol buffers dependency is absolutely hamstringing us and we will be leaving Google Cloud due to this issue.
The long version: protobuf is a dependency for background jobs on our Cloud Run instance (via Cloudtasker). I do understand that protocol buffers saves overhead in the long-run vs. JSON. I know the benefits, and they sound absolutely wonderful for a massive system at-scale. Considering, however, how much time we've spent chasing this issue and, when we do solve it, sitting around waiting for Docker images to rebuild gRPC and protobuf from scratch, we are not seeing any benefit.
Serializing is a solved problem, and protocol buffers should not be an exception. It needs an implementation that is lightweight and performant; this does not seem like a heavy lift if Google expects us to use it in order to use their cloud offerings. The Rust implementation might be a good place to start. Or why not distribute a pre-built Go version? That would be more in Google's wheelhouse.
Whatever the case, stuff like this is going to cost Google customers. And we are one of those customers. And that's a real bummer - both for us in time and money wasted on this, and for Google in customers lost.
I've also left GCP for this and a few other reasons. It's a pretty ridiculous reason to lose customers.
Unlike python which allows us to build different wheels for different platforms, gem only allows us to upload a single version to support all platforms. For that reason, we have no way to provide a gem which supports alpine but also supports the other platforms.
So, for now, the only solution is to build gem locally on alpine from our source code.
https://github.com/rake-compiler/rake-compiler-dock/issues/20#issuecomment-450457761
Hopefully, gem could do similar thing like wheel in python (might raise issue in gem instead :)).
Thanks for the note, @TeBoring - I appreciate your position here, and totally understand there are technical limitations at the Ruby layer. This is probably a more appropriate discussion to have about the protobuf library itself, since it is what is segfaulting. While Rubygems forcing a re-build slows builds down, it's by no means a dealbreaker; the library being built but not functioning, on the other hand, is.
I do want to say that I don't post comments like the above lightly. I am grateful daily for anyone's open-source contributions, and don't want to come across as a "choosing beggar." If I wasn't paying my hosting provider to force the technology on me, I would never dream of criticizing, even implicitly, the contributions of folks such as yourself. Thanks for everything you do!
Flip
@flipsasser (and others): In this case, the protobuf library is only surfacing a deeper issue, which is the platform ABI. The libstdc++ docs include an explanation of the API/ABI interplay which (IMO) is quite clear and well-written, at least relative to the topic, but also a bit C++-centric. The ABI-related issue with Alpine is actually deeper than C++, though: Alpine uses musl libc instead of glibc . In fact, this is one of the key selling points on the Alpine linux homepage. However, despite the benefit to the Alpine ecosystem, this also means glibc-based binaries (and libraries) are generally not drop-in compatible with Alpine Linux.
For prebuilt protobuf libraries targeting x86_64 CPUs and the Linux kernel, protobuf prebuilts further use a GNU-based target -- i.e., one where the library named libc.so
and found by the loader is implemented by glibc. This ABI is commonly denoted as x86_64-unknown-linux-gnu
[1], where -gnu
is notionally the OS, distinct from the kernel -linux
(and famously so).
Ultimately, for prebuilt binary libraries, we must choose an implementation and version of libc consistent with all other runtime libraries and the running process itself. Unfortunately, this means that protobuf prebuilts targeting glibc are rendered incompatible with Alpine Linux, because they disagree at a minimum on the lowest-level [2] runtime library.
As @TeBoring pointed out, building from source does not tie you to the target OS of our prebuilt binary libraries. The protobuf sources are source-compatible with Musl, and hence Alpine, similar to how they are source-compatible with macOS and Windows targets. (I do, however, appreciate that Docker surfaces the build overhead cost quickly and often.) Alternatively, as @nhattan and @rnnds point out, there are excellent binary compatibility shims you can use to run glibc-targeted binaries under Musl; and as @rnnds also points out, switching away from an Alpine-based image altogether to a Debian-based image also eliminates the issue.
At the risk of droning on, I would also point out that Alpine+Musl's differing ABI is not particularly new, nor is it unique to protobuf, Google Cloud, Ruby, or even C/C++... a quick web search shows other projects which have tripped over various ABI aspects when targeting Alpine and/or Musl. It is certainly a tricky topic, though, and one of my least favorite parts of Linux build engineering.
[1] The -unknown
part is a bit misleading... if you want, you can read more about the overall structure, history, and cruft of this name format on the OSDev wiki, the GNU Autoconf docs, the Clang docs, in Ian Lance Taylor's explanation, etc.
[2] As a mostly-academic distinction, libc.so
would technically be the second-lowest level library dependency on a dynamic executable. This is because, under Linux, the dynamic linker is the OS-side, i.e., kernel-external, portion of the program startup implementation. Practically speaking, however, the dynamic linker is also tied to the libc implementation by ABI. This is why the distinction is only mostly academic: glibc and Musl also have their own, separate dynamic linkers: glibc man page, musl conceptual design.
Thank you for the deeper dive, @dlj-NaN! This is super helpful for understanding the scope of the challenge in distributing protobuf on as many systems as possible. I'll be the first to say that, at a technical level, protobuf is well above my paygrade to criticize, as well as a wonderful and essential piece of engineering. Or rather, I assume it's wonderful; Clarke's 3rd law applies for me in this case! 😃
I readily concede that Alpine's choice of musl is causing the headache and that the fault lies there (otherwise we wouldn't have added gcompat to our apk add
command in the first place). It's also too bad to hear that protobuf is not alone in that complaint against Alpine! In our case, protobuf is the only thing that will not play nicely with Alpine, but I wholly admit that we may be an exception. Perhaps more people are struggling with a wider variety of libraries on Alpine than I fully appreciate or grasp. Regardless, I'm grateful for your masterful perspective on the issue, and I totally get that you cannot feasibly account for every possible edge case in distributing your product.
Having considered your perspective on this, I'm hoping you'll be willing to consider mine.
As a level-set, it's worth clarifying that we do not approach this issue from a purely technical perspective. My goal is not to argue that "Alpine is right" or "musl is the best." I'm also not attempting to compare it to other serialization technologies like JSON. I don't know enough about C++ or streaming deserialization or whatever API/ABI interplay even is to make a big, technical case for doing anything in any way differently.
Instead I ask you to consider that we, as a small, open-source project funded by grants, must contend with a fiscal reality in addition to a technical one. From where we stand, we pay (from a limited budget) for the use of GCP, and GCP, in turn, forces the technical decision to use protobuf on us. Our expectation is therefore that protobuf not throw a wrench into our process.
We consicously chose Alpine because it is widely used and extremely lightweight, two benefits which reduce costs and speed up build processes. And for most of our project's life, that reasoning has proved sound. In fact, right up until we added a dependency on protobuf (which was well after we cut over to GCP from Heroku), we experienced exactly zero Alpine-specific problems (in fairness, we have since experienced one issue related to timezone data).
We therefore view adopting a different OS (and re-writing and re-testing all of our Docker configuration) to be a non-starter. From our perspective, we are being asked to invalidate primary technical decisions on behalf of a tertiary dependency that we are required to use in exchange for paying Google.
That the technology causing this dissonance solves what feels, to us, like a solved problem (in this case serialization) is all the more frustrating. That frustration is compounded by the fact that there are other quick-compiling, light-weight, portable dependencies all over our application. By way of comparison, here are some benchmarks for building fairly high-performance and complicated pieces of software on my 4-year-old iMac:
magog:~→ brew reinstall redis --build-from-source
...
==> Summary
🍺 /usr/local/Cellar/redis/6.0.10: 11 files, 3.8MB, built in 13 seconds
magog:~→ brew reinstall postgresql --build-from-source
...
==> Summary
🍺 /usr/local/Cellar/postgresql/13.1: 3,217 files, 42.5MB, built in 2 minutes 4 seconds
magog:~→ brew reinstall protobuf --build-from-source
...
==> Summary
🍺 /usr/local/Cellar/protobuf/3.14.0: 256 files, 18.2MB, built in 6 minutes 31 seconds
Say what you will about the value of comparing benchmarks, this illustrates my point: protobuf appears to take a very long time to build, and (as the ongoing conversation in this Github issue clearly illustrates) still frequently segfaults after it has built, on one of the most popular operating systems in the virtualization ecosystem.
And so, to your customers, and regardless of Alpine's C++ std library choices, protobuf feels like a difficult-to-use dependency compounded by frustation at the fact that it feels unnecessary in all but the highest-volume use cases. Paying a vendor, only to be forced into making big technical decisions for the wrong reasons in order to use a serialization technology that feels redundant, overpowered, or both, is a huge, huge, huge bummer.
Please, please, please do not read this as an attack. I'm confident there are perfectly valid reasons protobuf is written in C++ and takes a very long time to compile, and you've exhaustively convinced me that protobuf and Alpine are not destined to play nicely any time soon, for understandable, fair, and valid reasons.
I am merely trying to illustrate what it feels like being forced to use it, and again, I would never dare offer even comparisons like the above if I weren't paying money to do so.
Ultimately it's our fault for selecting GCP without planning far enough down the road. Had we considered that we'd eventually need a background worker solution, we could have selected a provider who offers such a feature without workarounds involving disparate services like Cloud Tasks and long-running HTTP requests. And we're willing to pay the price for that mistake, but for our money, that price is not rebuilding our infrastructure to use a different OS because Google said we should; it's rebuilding our infrastructure with a different hosting provider. And that is specifically because Google made it too difficult to use theirs and we no longer felt we could trust them not to do it again.
Thank you for all of your contributions to the cause, and I don't take lightly that protobuf is probably at the core of many of the systems I take for granted on a daily basis. It's simply not a good solution for us. I'd be happy to chat more about this over email or on the phone; my contact information can be found on my website, which is listed in my Github profile.
Flip
Hi all it took me long to fix it on our project but i ended with this:
No segmentation fault and all running ok.
Hello there,
The following segmentation fault pops up upon running
ruby -e 'require "google/cloud/trace"'
inside of a Docker container based on
ruby:2.5-alpine
(alpine 3.7.0)Gemfile.lock (partial)
Ruby Segfault Dump