hexpm / hex

Package manager for the Erlang ecosystem.
https://hex.pm
972 stars 184 forks source link

Issues fetching deps #1019

Closed AndrewDryga closed 5 months ago

AndrewDryga commented 6 months ago

We hit a weird issue where hex will timeout fetching a specific dependency on GitHub Actions runner: https://github.com/firezone/firezone/actions/runs/8286457278/job/22713880034#step:3:415. We have 7+ retries and it always fails for Floki with:

* Getting nimble_pool (Hex package)
* Getting hpax (Hex package)
* Getting db_connection (Hex package)
* Getting floki (Hex package)
** (exit) exited in: GenServer.call(:hex_fetcher, {:await, {:tarball, "hexpm", "floki", "0.36.0"}}, 120000)
    ** (EXIT) time out
    (elixir 1.15.7) lib/gen_server.ex:1074: GenServer.call/3
    (hex 2.0.6) lib/hex/scm.ex:128: Hex.SCM.update/1
    (hex 2.0.6) lib/hex/scm.ex:227: Hex.SCM.checkout/1
    (mix 1.15.7) lib/mix/dep/fetcher.ex:64: Mix.Dep.Fetcher.do_fetch/3
    (mix 1.15.7) lib/mix/dep/converger.ex:229: Mix.Dep.Converger.all/8
    (mix 1.15.7) lib/mix/dep/converger.ex:244: Mix.Dep.Converger.all/8
    (mix 1.15.7) lib/mix/dep/converger.ex:162: Mix.Dep.Converger.init_all/8
    (mix 1.15.7) lib/mix/dep/converger.ex:146: Mix.Dep.Converger.all/4
Error: Process completed with exit code 1.

and is reproduced both when built within docker our on the host VM.

AndrewDryga commented 6 months ago

Was trying to upgrade to latest Elixir and Erlang versions trying to resolve the issue and it now Hex locally is totally broken:

❯ mix local.hex --force
* creating /Users/andrew/.asdf/installs/elixir/1.16.2-otp-26/.mix/archives/hex-2.0.6

❯ mix deps.get         
* Updating openid_connect (https://github.com/firezone/openid_connect.git - origin/master)
Resolving Hex dependencies...
eheap_alloc: Cannot allocate 976733209832459912 bytes of memory (of type "heap_frag").

Crash dump is being written to: erl_crash.dump...beam/erl_term.h:1492:tag_val_def() Assertion failed: tag_val_def error
[1]    10749 abort      mix deps.get

❯ elixir -v            
Erlang/OTP 26 [erts-14.2.3] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit]

Elixir 1.16.2 (compiled with Erlang/OTP 26)
wojtekmach commented 6 months ago

The eheap_alloc crash is described here: https://github.com/erlang/otp/issues/8238

The workaround is here: https://github.com/erlang/otp/issues/8238#issuecomment-1987173291

The fix is already merged in so should be out in the next patch release.

AndrewDryga commented 6 months ago

Thanks @wojtekmach! Do you have any idea why we might be hitting the original issue? I've already deleted the cache, bumped Elixir/Erlang versions, and even managed to reproduce it locally once. (Reproduces all the time on CI.)

wojtekmach commented 6 months ago

No idea, sorry.

jnylen commented 6 months ago

Getting the same on floki:

* Getting inflex (Hex package)
  Fetched package (https://repo.hex.pm/tarballs/inflex-2.1.0.tar)
* Getting nimble_ownership (Hex package)
  Fetched package (https://repo.hex.pm/tarballs/nimble_ownership-0.3.1.tar)
* Getting kday (Hex package)
  Fetched package (https://repo.hex.pm/tarballs/kday-1.0.2.tar)
* Getting floki (Hex package)
** (exit) exited in: GenServer.call(:hex_fetcher, {:await, {:tarball, "hexpm", "floki", "0.36.0"}}, 120000)
    ** (EXIT) time out
    (elixir 1.15.7) lib/gen_server.ex:1074: GenServer.call/3
    (hex 2.0.6) lib/hex/scm.ex:128: Hex.SCM.update/1
    (hex 2.0.6) lib/hex/scm.ex:227: Hex.SCM.checkout/1
    (mix 1.15.7) lib/mix/dep/fetcher.ex:64: Mix.Dep.Fetcher.do_fetch/3
    (mix 1.15.7) lib/mix/dep/converger.ex:229: Mix.Dep.Converger.all/8
    (mix 1.15.7) lib/mix/dep/converger.ex:162: Mix.Dep.Converger.init_all/8
    (mix 1.15.7) lib/mix/dep/converger.ex:146: Mix.Dep.Converger.all/4
    (mix 1.15.7) lib/mix/dep/converger.ex:89: Mix.Dep.Converger.converge/4
!     An error occurred during buildpack compilation
 !   Error deploying the application
 !   → Invalid return code from buildpack 

Works fine locally but fails on Scalingo/Heroku. Fails with 1.16.2, 1.16.1, 1.15.7 Erlang: 26.2.2, 26.1.2

@AndrewDryga found any way to fix this?

AndrewDryga commented 6 months ago

@jnylen you can override Floki as GitHub dependency:

{:floki, override: true, github: "philss/floki", ref: "3d5adab58a41b020a775baca82fe15c0c364daab"}

I believe somehow the archive for the latest version was corrupted so a version bump should help, and we probably should recall the corrupted version too.

khustochka commented 6 months ago

This is still happening even with the new release floki 0.36.1.

#21 3.042 * Getting floki (Hex package)
#21 123.1 ** (exit) exited in: GenServer.call(:hex_fetcher, {:await, {:tarball, "hexpm", "floki", "0.36.1"}}, 120000)
#21 123.1     ** (EXIT) time out
#21 123.1     (elixir 1.16.2) lib/gen_server.ex:1114: GenServer.call/3
#21 123.1     (hex 2.0.6) lib/hex/scm.ex:128: Hex.SCM.update/1
#21 123.1     (hex 2.0.6) lib/hex/scm.ex:227: Hex.SCM.checkout/1
#21 123.1     (mix 1.16.2) lib/mix/dep/fetcher.ex:64: Mix.Dep.Fetcher.do_fetch/3
#21 123.1     (mix 1.16.2) lib/mix/dep/converger.ex:229: Mix.Dep.Converger.all/8
#21 123.1     (mix 1.16.2) lib/mix/dep/converger.ex:162: Mix.Dep.Converger.init_all/8
#21 123.1     (mix 1.16.2) lib/mix/dep/converger.ex:146: Mix.Dep.Converger.all/4
#21 123.1     (mix 1.16.2) lib/mix/dep/converger.ex:89: Mix.Dep.Converger.converge/4
#21 ERROR: process "/bin/sh -c mix deps.get --only $MIX_ENV" did not complete successfully: exit code: 1
ericmj commented 6 months ago

Do you have a project that reproduces the error?

AndrewDryga commented 6 months ago

@ericmj I've tried to make a simple repo to reproduce but failed to do so, maybe it happens in umbrella apps with a much more complex setup..

wojtekmach commented 6 months ago

Could you try a new project with the same mix.lock and if you can reproduce slowly cut down the lock to the minimal one?

ericmj commented 6 months ago

If you can't reproduce locally can you share the project and branch where it happens on CI?

khustochka commented 6 months ago

I will try to make a repo to reproduce, and yes, this is an umbrella application, and I had {:floki, ">= 0.30.0", only: :test} in my mix.exs. When I change to {:floki, ">= 0.30.0"} it does not fail.

khustochka commented 6 months ago

Here is the repository: https://github.com/khustochka/floki_test2_umbrella

I reproduce the error by running:

docker build . --progress plain
 > [builder  9/20] RUN mix deps.get --only prod:
122.7 ** (exit) exited in: GenServer.call(:hex_fetcher, {:await, {:tarball, "hexpm", "floki", "0.36.1"}}, 120000)
122.7     ** (EXIT) time out
122.7     (elixir 1.16.2) lib/gen_server.ex:1114: GenServer.call/3
122.7     (hex 2.0.6) lib/hex/scm.ex:128: Hex.SCM.update/1
122.7     (hex 2.0.6) lib/hex/scm.ex:227: Hex.SCM.checkout/1
122.7     (mix 1.16.2) lib/mix/dep/fetcher.ex:64: Mix.Dep.Fetcher.do_fetch/3
122.7     (mix 1.16.2) lib/mix/dep/converger.ex:229: Mix.Dep.Converger.all/8
122.7     (mix 1.16.2) lib/mix/dep/converger.ex:162: Mix.Dep.Converger.init_all/8
122.7     (mix 1.16.2) lib/mix/dep/converger.ex:146: Mix.Dep.Converger.all/4
122.7     (mix 1.16.2) lib/mix/dep/converger.ex:89: Mix.Dep.Converger.converge/4

And looks like it is only reproduced in an umbrella app and with only: :test. In a non-umbrella app with only: :test floki is just not being installed in prod environment.

ericmj commented 6 months ago

Closing this because we believe it's a mix bug and tracking it here https://github.com/elixir-lang/elixir/issues/13490.

josevalim commented 5 months ago

The Elixir one was another bug, so we are reopening it. :)

josevalim commented 5 months ago

Luckily, it has already been fixed by: https://github.com/elixir-lang/elixir/commit/a5e53b794fda3ab8f436429b696d5d07e8520cc1 :)