irods / irods_rule_engine_plugin_audit_amqp

BSD 3-Clause "New" or "Revised" License
2 stars 13 forks source link

Java heap space runs out while running tests #119

Open alanking opened 1 year ago

alanking commented 1 year ago

Bug Report

iRODS Version, OS and Version

4.3.0, Ubuntu 20.04 Plugin built based on #118

What did you try to do?

Run the test suite with ActiveMQ 5.14 as the message broker.

Expected behavior

I expected the test suite to complete successfully.

Observed behavior (including steps to reproduce, if applicable)

After a little over 2.5 hours of running tests, it seems that the Java heap ran out of memory, causing ActiveMQ to become unresponsive:

{
  "error_condition::description": "Unexpected error occurred: java.lang.OutOfMemoryError: Java heap space",
  "error_condition::name": "amqp:decode-error",
  "error_condition::what": "amqp:decode-error: Unexpected error occurred: java.lang.OutOfMemoryError: Java heap space",
  "log_category": "rule_engine",
  "log_facility": "local0",
  "log_level": "error",
  "log_message": "Connection error in proton messaging handler",
  "request_api_name": "RM_COLL_AN",
  "request_api_number": 679,
  "request_api_version": "d",
  "request_client_user": "otherrods",
  "request_host": "192.168.16.3",
  "request_proxy_user": "otherrods",
  "request_release_version": "rods4.3.0",
  "rule_engine_plugin": "audit_amqp",
  "server_host": "d6b557785867",
  "server_pid": 1050225,
  "server_timestamp": "2023-06-07T17:55:44.554Z",
  "server_type": "agent"
}

The server began accumulating agent processes as they timed out attempting to connect to ActiveMQ. The test suite looked frozen but it was just processing very slowly as it waited on timeouts for each message it attempted to send. I waited for a bit and then killed the ActiveMQ process. The tests then continued as normal, albeit a bit faster and with many error messages in the log about not being able to connect to the message broker.

I'm not really sure where the problem lies here, but just writing it down so that we can address it. I think the plugin is probably not at fault here, but we may be able to configure the message broker in such a way that this is avoided?

Bonus: The tests now run very slowly, but this may not be a bad thing. I think this is because the plugin is working more as advertised. The slowness may be caused by #107.

alanking commented 1 year ago

Might consider something like the heap size option for Elastic found here: https://github.com/irods/contrib/blob/main/irods_audit_elk_stack

The tests could also consume the messages from the queue between tests to ensure that the queue does not grow too big.