nirev / exmagick

GraphicsMagick bindings for Elixir
BSD 3-Clause "New" or "Revised" License
46 stars 4 forks source link

Memory Leak #9

Open weixiyen opened 6 years ago

weixiyen commented 6 years ago

I'm processing about 2000 images per minute across 8 nodes in production and noticed that system memory is going up.

:erlang.memory() is stable, but system memory goes up by about ~50mb per hour per node until the VM just crashes.

It seems that the NIF bindings from exmagick might be the issue here where memory is not being freed up somewhere.

The commands I'm using are init! image_load!, image_dump!, and size!

0.0.5

dgvncsz0f commented 6 years ago

looking at the code my guess is image_dump!. enif_make_resource_binary requires enif_release_resource which is missing. i'll make a test to verify and if that is the case i'll make a patch fixing.

thanks!

dgvncsz0f commented 6 years ago

@weixiyen could you test #10 and let me know if it fixes your problem?

weixiyen commented 6 years ago

I added the config to my deps, works fine locally.

{:exmagick, git: "https://github.com/Xerpa/exmagick.git", ref: "b9162a1bd0908d0e3d5d501a07764c0e2936eaa2"},

In prod getting this crash on startup for this commit hash whereas hex.pm's version 0.0.5 works fine.

2018-04-11 03:36:29 crash_report
    initial_call: {supervisor,kernel,['Argument__1']}
    pid: <0.3408.0>
    registered_name: []
    error_info: {exit,{on_load_function_failed,'Elixir.ExMagick'},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,352}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
    ancestors: [kernel_sup,<0.3383.0>]
    messages: []
    links: [<0.3384.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 117
2018-04-11 03:36:29 supervisor_report
    supervisor: {local,kernel_sup}
    errorContext: start_error
    reason: {on_load_function_failed,'Elixir.ExMagick'}
    offender: [{pid,undefined},{id,kernel_safe_sup},{mfargs,{supervisor,start_link,[{local,kernel_safe_sup},kernel,safe]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]
2018-04-11 03:36:30 crash_report
    initial_call: {application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}
    pid: <0.3382.0>
    registered_name: []
    error_info: {exit,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,134}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
    ancestors: [<0.3381.0>]
    messages: [{'EXIT',<0.3383.0>,normal}]
    links: [<0.3381.0>,<0.3380.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 152
2018-04-11 03:36:30 std_info
    application: kernel
    exited: {{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,[]]}}
    type: permanent
{"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,kernel_safe_sup,{on_load_function_failed,'Elixir.ExMagick'}}},{kernel,start,[normal,
dgvncsz0f commented 6 years ago

the master of exmagick includes flags to enable dirty scheduler that requires a fairly recent erlang version, probably that is why. we've been testing this in production for a while now and is stable but we haven't made a release yet to avoid breaking compatibility. let me fix that and we test again.

dgvncsz0f commented 6 years ago

ok. i made dirty scheduler disabled by default [opt-in by using an env var]. could you try it again?

weixiyen commented 6 years ago

Yep, trying it again.

I just updated my erlang distribution to erts-9.3 with the env flag turned on.

Will get back to you in about ~12 hours with some graphs to compare before / after.

Thank you!

weixiyen commented 6 years ago

Hi @dgvncsz0f - just checking back in.

I think there is still a memory leak, and it's kind of hard to tell how much the fix affected anything.

After running in prod for roughly 24 hours, there is still a big difference between what Erlang thinks it is using versus what the Linux Kernel thinks.

For example, on one node, Erlang thinks it is using 300MB of memory while the Linux Kernel thinks Erlang is using 5GB of memory.

According to Fred Herbert, this type of symptom could be caused by a NIF: http://erlang.org/pipermail/erlang-questions/2013-September/075401.html

Although, I can't be 100% sure.

Is it possible that there are minor leaks related to either init!, image_load!, or size! functions?

I'll try to do some more testing on my end as well to see if I can get more helpful information

dgvncsz0f commented 6 years ago

@weixiyen thanks. if you can provide more information it will be valuable. later today i'll continue investigating looking for other possible leaks. i'll get back to you as soon as i have something up.

Thanks!

dgvncsz0f commented 6 years ago

sorry, accidentally closed when merged the pr...