kimlai / tz_world

Resolve timezones from a location.
https://hexdocs.pm/tz_world
MIT License
40 stars 12 forks source link

`tz_world.update` taking over double the memory on OTP 27.0.1 #38

Open Jdyn opened 1 month ago

Jdyn commented 1 month ago

EDIT: I narrowed it down further and was able to build on 1.17.2 and OTP 26.2.5.2. So it looks like the problem is OTP 27.0.1

I am on tz_world 1.3.3

Hey, i've updated to 1.17 but am unable to deploy due to a significant increase in memory usage when running tz_world.update compared to 1.15.

Here are the two docker images. 1.15.7 builds perfectly and the image with 1.17 OOMs after 4gb of usage.

Elixir 1.15.7 OTP 25.3.2.7 Working image ``` # Find eligible builder and runner images on Docker Hub. We use Ubuntu/Debian # instead of Alpine to avoid DNS resolution issues in production. # # https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=ubuntu # https://hub.docker.com/_/ubuntu?tab=tags # # This file is based on these images: # # - https://hub.docker.com/r/hexpm/elixir/tags - for the build image # - https://hub.docker.com/_/debian?tab=tags&page=1&name=bullseye-20231009-slim - for the release image # - https://pkgs.org/ - resource for finding needed packages # - Ex: hexpm/elixir:1.15.7-erlang-25.3.2.7-debian-bullseye-20231009-slim # ARG ELIXIR_VERSION=1.15.7 ARG OTP_VERSION=25.3.2.7 ARG DEBIAN_VERSION=bullseye-20231009-slim ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}" ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}" FROM ${BUILDER_IMAGE} as builder # install build dependencies RUN apt-get update -y && apt-get install -y build-essential git \ && apt-get clean && rm -f /var/lib/apt/lists/*_* # prepare build dir WORKDIR /app # install hex + rebar RUN mix local.hex --force && \ mix local.rebar --force # set build ENV ENV MIX_ENV="prod" # install mix dependencies COPY mix.exs mix.lock ./ RUN mix deps.get --only $MIX_ENV RUN mkdir config # copy compile-time config files before we compile dependencies # to ensure any relevant config change will trigger the dependencies # to be re-compiled. COPY config/config.exs config/${MIX_ENV}.exs config/ RUN mix deps.compile COPY priv priv COPY lib lib # Compile the release RUN mix compile # Changes to config/runtime.exs don't require recompiling the code COPY config/runtime.exs config/ COPY rel rel RUN mix tz_world.update RUN mix release # start a new build stage so that the final image will only contain # the compiled release and other runtime necessities FROM ${RUNNER_IMAGE} RUN apt-get update -y && \ apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \ && apt-get clean && rm -f /var/lib/apt/lists/*_* # Set the locale RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen ENV LANG en_US.UTF-8 ENV LANGUAGE en_US:en ENV LC_ALL en_US.UTF-8 WORKDIR "/app" RUN chown nobody /app # set runner ENV ENV MIX_ENV="prod" # Only copy the final release from the build stage COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/nimble ./ USER nobody # If using an environment that doesn't automatically reap zombie processes, it is # advised to add an init process such as tini via `apt-get install` # above and adding an entrypoint. See https://github.com/krallin/tini for details # ENTRYPOINT ["/tini", "--"] CMD ["/app/bin/server"] ```
Elixir 1.17.2 OTP 27.0.1 OOM ``` # Find eligible builder and runner images on Docker Hub. We use Ubuntu/Debian # instead of Alpine to avoid DNS resolution issues in production. # # https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=ubuntu # https://hub.docker.com/_/ubuntu?tab=tags # # This file is based on these images: # # - https://hub.docker.com/r/hexpm/elixir/tags - for the build image # - https://hub.docker.com/_/debian?tab=tags&page=1&name=bullseye-20231009-slim - for the release image # - https://pkgs.org/ - resource for finding needed packages # - Ex: hexpm/elixir:1.15.7-erlang-25.3.2.7-debian-bullseye-20231009-slim # ARG ELIXIR_VERSION=1.17.2 ARG OTP_VERSION=27.0.1 ARG DEBIAN_VERSION=buster-20240612-slim ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}" ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}" FROM ${BUILDER_IMAGE} as builder # install build dependencies RUN apt-get update -y && apt-get install -y build-essential git \ && apt-get clean && rm -f /var/lib/apt/lists/*_* # prepare build dir WORKDIR /app # install hex + rebar RUN mix local.hex --force && \ mix local.rebar --force # set build ENV ENV MIX_ENV="prod" # install mix dependencies COPY mix.exs mix.lock ./ RUN mix deps.get --only $MIX_ENV RUN mkdir config # copy compile-time config files before we compile dependencies # to ensure any relevant config change will trigger the dependencies # to be re-compiled. COPY config/config.exs config/${MIX_ENV}.exs config/ RUN mix deps.compile COPY priv priv COPY lib lib # Compile the release RUN mix compile # Changes to config/runtime.exs don't require recompiling the code COPY config/runtime.exs config/ COPY rel rel RUN mix tz_world.update RUN mix release # start a new build stage so that the final image will only contain # the compiled release and other runtime necessities FROM ${RUNNER_IMAGE} RUN apt-get update -y && \ apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \ && apt-get clean && rm -f /var/lib/apt/lists/*_* # Set the locale RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen ENV LANG en_US.UTF-8 ENV LANGUAGE en_US:en ENV LC_ALL en_US.UTF-8 WORKDIR "/app" RUN chown nobody /app # set runner ENV ENV MIX_ENV="prod" # Only copy the final release from the build stage COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/nimble ./ USER nobody # If using an environment that doesn't automatically reap zombie processes, it is # advised to add an init process such as tini via `apt-get install` # above and adding an entrypoint. See https://github.com/krallin/tini for details # ENTRYPOINT ["/tini", "--"] CMD ["/app/bin/server"] ```

Screenshot 2024-07-22 091754

Any ideas?

kipcole9 commented 1 month ago

That's definitely unexpected. And there haven't been any commits that should affect the mix task. I see you changed the issue description to reflect an OTP 27 difference, rather than an Elixir 1.17 difference. Does that mean you see the memory difference with the same Elixir version but different OTP version?

I will certainly take a look at this, but if might take a couple of days to try and diagnose.

kipcole9 commented 1 month ago

And I'll experiment with using the :json module in OTP27 and see if that makes an immediate difference.

Jdyn commented 1 month ago

That's definitely unexpected. And there haven't been any commits that should affect the mix task. I see you changed the issue description to reflect an OTP 27 difference, rather than an Elixir 1.17 difference. Does that mean you see the memory difference with the same Elixir version but different OTP version?

I will certainly take a look at this, but if might take a couple of days to try and diagnose.

I can confirm that the memory difference is caused by the difference in OTP versisons for me. Building with OTP 25-26, and elixir 15-17 sees the same memory usage during tz_world.update. But when I introduce OTP 27, the memory usage spikes by double or more, though I cannot see the peak spike because my machine OOMs before it can complete.

kipcole9 commented 1 month ago

Thanks much for the diagnostic. There are some things I can try to do to reduce memory usage and I'll experiment on the weekend. The key function that is most likely the memory consumer is:

  def transform_source_data(source_data, version) when is_binary(source_data) do
    case :zip.unzip(source_data, [:memory]) do
      {:ok, [{_, json} | _rest]} ->
        json
        |> Jason.decode!()
        |> Geo.JSON.decode!()
        |> Map.get(:geometries)
        |> Enum.map(&update_map_keys/1)
        |> Enum.map(&calculate_bounding_box/1)
        |> List.insert_at(0, version)

      error ->
        raise RuntimeError, "Unable to unzip downloaded data. Error: #{inspect error}"
    end
  end

With that in mind I can try:

  1. Jason.decode!(strings: :copy) (the default is :reference) since maybe the issue is that binaries are not being garbage collected
  2. Switch to the new :json module in OTP 27 and see if that makes a difference.
  3. See if I can use Jaxon's streaming json decoder

I will put two development branches together now so you can test (1) and (2). I'll look at (3) over the weekend.

And somehow I have to find a reproducible case I can submit to the OTP team.

kipcole9 commented 1 month ago

I've done some basic experiments and I see no material difference between using :json versus Jason. And curiously I see no material difference in memory usage on OTP26 versus OTP27 on my iMac Pro.

That means that (1) and (2) don't appear to make any material difference in memory consumption. I also added a call to :erlang.garbage_collect after decoding the JSON and that also made no material difference.

Jdyn commented 1 month ago

That's interesting, this isn't my strongest area but perhaps it could be a memory leak involving linux and OTP 27 since it is seemingly only happening in this linux docker image. I did provide the debian image I am building with. Feel free to take your time as OTP 26 is sufficing quite well.

Jdyn commented 1 month ago

Could be related? https://github.com/erlang/otp/issues/8682

peaceful-james commented 1 week ago

Could be related? erlang/otp#8682

This seems like the problem in my case. I have plenty of memory but am getting runtime crashes on boot that look like other segfault bugs I have seen in the past.

Update: actually, I am seeing my application crash even with TzWorld.Backend.Memory. It is not an OOM problem.

The only error I see is this:

{exit,terminating,[{application_controller,call,2,[{file,"application_controller.erl"},{line,511}]},{application,enqueue_or_start,6,[{file,"application.erl"},{line,380}]},{application,ensure_all_started,3,[{file,"application.erl"},{line,359}]},{elixir,start_cli,0,[{file,"src/elixir.erl"},{line,195}]},{init,start_it,1,[]},{init,start_em,1,[]},{init,do_boot,3,[]}]}

lang versions: erlang 27.0.1 elixir 1.17.2-otp-27

Another update: The problem went away when I upgraded :tz from 0.27.1 to 0.27.2 @Jdyn can you try this?

kipcole9 commented 6 days ago

I've pushed [a commit[(https://github.com/kipcole9/tz_world/commit/aad71d3815bf4b16a438ac8d4a07b5f7e125a5d4) that makes some attempt to more aggressively garbage collect the large binaries that get generated during the update process.

I don't think this is a comprehensive solution, but I'd be interested if it makes a difference in your situations?

Mix.exs

def deps() do
  [
    {:tz_world, GitHub: "kipcole9/tz_world"}
  ]
end

Mix task

I've added a new --trace argument that does limited tracing and memory profiling.

% mix tz_world.update --trace

Feedback most definitely welcome.

kipcole9 commented 6 days ago

The problem went away when I upgraded :tz from 0.27.1 to 0.27.2

Very interesting - but I don't think it relates to this particular issue?

peaceful-james commented 6 days ago

Very interesting - but I don't think it relates to this particular issue?

You are right.