waarp / Waarp-All

This version is a major version for all Waarp Modules, previously being split.
Other
27 stars 12 forks source link

OOME Error on pull request #52

Closed marakiis closed 4 years ago

marakiis commented 4 years ago

It seems there is a memory leak on the R66 client. While transmitting a big file (>400Mo) an OOME occurs.

client.log.2020-07-01.0.log

I suspect data packets are not discarded correctly.

fredericBregier commented 4 years ago

I've looked at it. I've made a IT test for that:

Could you provide the configuration and scenario?

fredericBregier commented 4 years ago

And of course, which version (3.3.4) ?

bcarlin commented 4 years ago

The version is 3.3.4, with an Xmx at 512m.

marakiis commented 4 years ago

The case is on a msend with a recv rule

fredericBregier commented 4 years ago

Tests done, using -Xmx512m, send and recv, tls and not tls, limit bandwidth and no limit, direct and submit, no issue until now.

Waiting for more information... ;-)

marakiis commented 4 years ago

Hi, It seems the OOME was caused by an insufficient amount of RAM on the machine (only 400Mo were available). However it doesn't feel right that a R66Client needs that amount of memory.

fredericBregier commented 4 years ago

Well, we made already a lot of memory optimization (more than 40% less). Memory footprint is a combination of active services:

400 MB is really a very short amount of memory for usual server. If everything is deactivated, it might be OK.

My feeling is that 1GB for modern servers is quite normal for a Java program, including all those options.

However, REST V2 for instance could be optimized a bit (no configuration at all on thread or memory consumption were made).

bcarlin commented 4 years ago

I concur with you: If Waarp R66 is started as a server, esp. for a production server, 1GB is a minimum. but for a client (here it is a client that OOMEed), there is no web interfaces, REST APIs etc...

It is all very empirical, but it seems I need to put my Xmx at 128m to pull a 1.5GB file from a server without TLS, where my client is configured with runlimit=1, serverthread=1 and clientthread=1. This seems to me a little weird, even if for a java app, it is not that big...

fredericBregier commented 4 years ago

Well, Java memory footprint by itself is quite huge (note that there are about 30 MB of native jar, then the JRE itself that shall be loaded). By default, Java needs at least 100 MB (so -Xmx128m is the minimum).

Nothing to do about that...

bcarlin commented 4 years ago

I reckon...

fredericBregier commented 4 years ago

Small improvement in last MR.

Doing the following works for a client (direct transfer or submit transfer):

Doing the following works for a Server according to related needs:

bcarlin commented 4 years ago

Ok... So I have played a little and made some tests, just for fun.

With a small script, I benchmarked the RAM usage of some JVM (mainly distributions of OpenJDK).

The client was setup with runlimit=1, serverthread=1 and clientthread=1. It used a rule in received mode without any task, to get a 1.5GB file without TLS from a server installed on my host machine.

I recorded the apparent RAM usage as it is viewed by the kernel (RSS memory). Here are the results :

bench

The x axis is the time elapsed from the start of the container (roughly 1 data point every 0.5 second).

All the Hotspot based JVM are in the same range, with Amazon Corretto running on Amazon Linux being slightly below.

JVMs based on OpenJ9, however, are a surprise as far as memory consumption is concerned, since they used ~50% less RAM, without a too significative impact on transfer duration.

Each container was started with the default settings. I did not include Java 7 (I was curious) because the transfer failed with UnsupportedClassVersionError: org/snmp4j/agent/MOGroup : Unsupported major.minor version 52.0.

fredericBregier commented 4 years ago

Added small improvements on Hash computation (limit byte copies, so memory footprint and cpu usage)

fredericBregier commented 4 years ago

This bug should be closed once the MR is merged. A reminder:

bcarlin commented 4 years ago

Yes. I'm closing it since the issue was not bound to the application but to the system: there was less RAM available on the server than the Xmx configured (The JVM requested 512Mo, but the system only had 400Mo available).

I'm sending a PR to include the RAM requirements in the doc.