mathieuprog / tz

Time zone support for Elixir
Apache License 2.0
147 stars 11 forks source link

Out of memory when recompiling time zone periods on a memory-limited system #9

Closed OldhamMade closed 1 year ago

OldhamMade commented 2 years ago

I'm running an app on fly.io, with a 1GB container size. Using Tzdata I can boot without issue. However, switching to Tz I'm seeing this on startup:

14:18:01.356 [info] Tz is downloading the latest IANA time zone database (version 2021c)...
14:18:02.785 [info] Tz download done
14:18:02.814 [info] Tz is recompiling time zone periods...
[    7.749887] Out of memory: Killed process 511 (beam.smp) total-vm:2665752kB, anon-rss:902056kB, file-rss:0kB, shmem-rss:63628kB, UID:0 pgtables:2380kB oom_score_adj:0
Main child exited with signal (with signal 'SIGKILL', core dumped? false)
Reaped child process with pid: 595, exit code: 0
Starting clean up.
Process appears to have been OOM killed!

Config is:

config :elixir, :time_zone_database, Tz.TimeZoneDatabase
config :tz, reject_time_zone_periods_before_year: 2019

I'm not able to replicate this locally, unfortunately. I'm using docker run --memory 512m --rm -i -p 8080:8080 {container} but I'm on macOS and don't hit a memory error.

OldhamMade commented 2 years ago

I've had a bit of a play around with the code, and I can't see any quick ways to reduce the memory usage, since the entire dataset is needed in memory to compile the in-memory modules Tz.PeriodsProvider and Tz.OngoingChangingRulesProvider.

One approach to minimise memory overhead could be to generate each module separately, streaming the data from the Iana file, processing and streaming into a file on-disk, then finalising, formatting, and loading from disk on as necessary. Would be a bit of a rework though, but it would bring it inline with the overhead of Tzdata since (IIRC) that creates ETS tables on disk from the Iana data and loads the ETS tables on startup.

OldhamMade commented 2 years ago

BTW, the way I'm testing changes is to require Benchee, and execute mix run benchmarks/compiler.exs which contains something like:

Tz.IanaDataDir.maybe_copy_iana_files_to_custom_dir()

Benchee.run(
  %{
    "compiler1" => fn -> Tz.CompilerOriginal.compile() end,
    "compiler2" => fn -> Tz.Compiler.compile() end,
  },
  memory_time: 10
)

I clone lib/compiler.ex as lib/compiler_original.ex and rename to Tz.CompilerOriginal as the reference.

mathieuprog commented 2 years ago

@OldhamMade ty for reporting this issue and for sharing your experiments. How much memory usage does Benchee report for you for the original compiler?

OldhamMade commented 2 years ago

Here's what I'm seeing from my latest tests:

Benchmarking compiler1…
Benchmarking compiler2…

Name                ips        average  deviation         median         99th %
compiler1          0.37         2.74 s     ±1.28%         2.74 s         2.76 s
compiler2          0.36         2.80 s     ±2.06%         2.80 s         2.84 s

Comparison:
compiler1          0.37
compiler2          0.36 - 1.02x slower +0.0601 s

Memory usage statistics:

Name              average  deviation         median         99th %
compiler1       364.44 MB     ±0.00%      364.43 MB      364.44 MB
compiler2       364.32 MB     ±0.00%      364.32 MB      364.32 MB

Comparison:
compiler1       364.43 MB
compiler2       364.32 MB - 1.00x memory usage -0.11152 MB

Benchee doesn't really give a full overview of the amount of memory being used, since here it is only testing the compiler (and modules called from it) and isn't completing a full iteration from what I can tell (ips is less than 1). However, it does provide a useful comparison, and one can aim for changes to come in below that of the original implementation.

OldhamMade commented 2 years ago

BTW, I don't see this as a "bug" as such. I think it's simply a consequence of the implementation, and is perfectly valid. Saying that, it does mean that anyone using memory-restricted containers would struggle to add and use Tz, which may harm adoption.

mathieuprog commented 2 years ago

I don't think the total memory usage will be any different if we change the compilation strategy, because in the end all the data must be stored in memory in any case.

In version 0.9.0 I compiled a module per time zone area (Europe, Africa, etc.), see here: https://github.com/mathieuprog/tz/blob/v0.9.0/lib/compiler.ex#L135

Could you test if the REcompilation works (and if it does, try it a few times)?

mathieuprog commented 2 years ago

Also how would you explain that the library can initially compile the data without errors, but once your app is running and Tz gets updated and recompiled, it fails?

mathieuprog commented 2 years ago

fullsweep_after option to 0 might also be interesting to test in Tz.UpdatePeriodically: https://www.coletiv.com/blog/elixir-genserver-memory-issues/

mathieuprog commented 2 years ago

Could you also confirm that Tzdata updates successfully to 2021c (from 2020e I think) in fly.io?

OldhamMade commented 2 years ago

I don't think the total memory usage will be any different if we change the compilation strategy, because in the end all the data must be stored in memory in any case.

My goal was to reduce the amount of Tz data in memory by limiting to this year, +/- 1. So once the modules are compiled, I should only have 3 years' worth of Tz data in memory compared to the full dataset used by Tzdata.

how would you explain that the library can initially compile the data without errors, but once your app is running and Tz gets updated and recompiled, it fails?

This is because the docker container is built on my local machine before being pushed to fly.io for deployment. As I understand it, compilation of Tz doesn't trigger the download & compile of the Iana files. When this happens during Tz.UpdatePeriodically, the modules created by Module.create/3 wouldn't be written as beam files to disk. From the manual:

Similar to Kernel.defmodule/2, the binary will only be written to disk as a .beam file if Module.create/3 is invoked in a file that is currently being compiled.

Maybe I'm misunderstanding the code and the beam files are being written during the initial compile of the Tz module?

Regarding fullsweep_after, I have that set to 0 as a global option, but I'm happy to test directly setting it in application.ex.

Could you also confirm that Tzdata updates successfully to 2021c (from 2020e I think) in fly.io?

I think you ship with 2020a, and I see this during the startup process:

14:18:01.356 [info] Tz is downloading the latest IANA time zone database (version 2021c)...
14:18:02.785 [info] Tz download done

So that process looks like it completes ok.

I think what happens then is that all of that is loaded into memory while the quoted data is built, which is also kept in memory until it can be compiled into the new modules and loaded.

In version 0.9.0 I compiled a module per time zone area (Europe, Africa, etc.), see here: https://github.com/mathieuprog/tz/blob/v0.9.0/lib/compiler.ex#L135

Could you test if the REcompilation works (and if it does, try it a few times)?

I'll try to get some time one evening this week, but I think the earliest I'll get a decent block of time to take a look will be Oct 23/24.

mathieuprog commented 2 years ago

Could you also confirm that Tzdata updates successfully to 2021c (from 2020e I think) in fly.io?

I think you ship with 2020a, and I see this during the startup process:

Actually I was talking about the other library Tzdata. See if it can successfully update to 2021c while it's running.

mathieuprog commented 2 years ago

My goal was to reduce the amount of Tz data in memory by limiting to this year, +/- 1. So once the modules are compiled, I should only have 3 years' worth of Tz data in memory compared to the full dataset used by Tzdata.

You can set the :build_time_zone_periods_with_ongoing_dst_changes_until_year option to 0 to not compile future period transitions, but I doubt that will really help.

mathieuprog commented 2 years ago

As I understand it, compilation of Tz doesn't trigger the download & compile of the Iana files.

It doesn't trigger the download but it does trigger the compilation of the Tz.PeriodsProvider module from the raw IANA data (btw you can notice that there is no file for Tz.PeriodsProvider in the lib).

This is run on compilation of the Tz dependency: https://github.com/mathieuprog/tz/blob/v0.20.1/lib/compiler_runner.ex#L6

Same as in the updater (which is called from the Tz.UpdatePeriodically worker): https://github.com/mathieuprog/tz/blob/v0.20.1/lib/updater.ex#L20

In other words, you do compile tz successfully but you can't recompile the newly downloaded tz data.

OldhamMade commented 2 years ago

Actually I was talking about the other library Tzdata. See if it can successfully update to 2021c while it's running.

Yep, it does do that; my app is about 3 years old now and has been using Tzdata since the beginning. The update generally happens at startup after a deploy, and memory stays pretty consistent while this happens. I've never had issues deploying, even with containers restricted to 512Mb.

You can set the :build_time_zone_periods_with_ongoing_dst_changes_until_year option to 0 to not compile future period transitions, but I doubt that will really help.

The latest config I've been working with is as follows, but yes – it doesn't really help during the compile phase.

config :tz,
  reject_time_zone_periods_before_year: this_year - 1,
  build_time_zone_periods_with_ongoing_dst_changes_until_year: this_year + 2

In other words, you do compile tz successfully but you can't recompile the newly downloaded tz data.

Yep. So to get a successful deploy I've disabled Tz.UpdatedPeriodically and added a mix task to call Tz.Updater.maybe_recompile/0 during the docker build process. It's a work-around; my build is executed locally before being pushed to the host, so I can take advantage of the lack of memory restrictions to build the final dataset.

Here's my memory usage before and after the deploy which changes from Tzdata to Tz:

mem-diff-tzdata-tz

I'm seeing a reduction of over 100Mb, which is great! This means I can reduce my container memory allocation to 512Mb instead of 1Gb, but it does mean that I'll have to regularly build & deploy manually to get updated IANA information.

It would be great if the memory could be reduced for compile, since I'm sure I'm not the only one who will have this issue, but it's good to know there's a workaround. I'll keep looking to see if I can reduce the footprint, but this ticket can probably be closed now unless you plan to do further work yourself.

mathieuprog commented 2 years ago

There are two things happening when initially compiling and later on updating to a new iana tz data version:

  1. parsing the IANA data files into time zone periods;
  2. creating module Tz.PeriodsProvider or rewriting the module with Module.create (another module Tz.OngoingChangingRulesProvider is also compiled).

This happens all in Tz.Compiler. Do you think it's possible to have some inspect calls to see at what stage it crashes?

For example in that module we have a for that goes through each file, and we could also log before the Module.create call.

If the other lib Tzdata can update, i guess that would mean Tz should also be able to parse the files, and it would crash at the Module.create call. Just guessing.

OldhamMade commented 1 year ago

I'm going to close this issue. I managed to work around it by adding a 5gb swap space to my VM as it boots, which works around the memory limitation. This allows Tz to build successfully.

mathieuprog commented 1 year ago

Thank you for the feedback.

I think those with memory-limited system should update the IANA tzdata files manually.

We could make it consume less memory, but would it then really be reliable for systems with tight memory limits anyway? So I would say this work is not worth it.

I also have personally no real understanding of the significance of the amount of memory that tz needs on a server in production, with its possible impacts.