irods / irods_capability_automated_ingest

Other
12 stars 16 forks source link

document Redis memory best-practices #150

Closed holtgrewe closed 1 month ago

holtgrewe commented 3 years ago

I'm trying to ingest tens of thousands of files.

redis starts to use more and more memory until it all memory is used (there appears to be a "sync time" key for each ingested file) and it eventually fails with the following in redis log

921:M 14 Jan 00:14:07.095 * 10000 changes in 60 seconds. Saving...
921:M 14 Jan 00:14:07.114 # Can't save in background: fork: Cannot allocate memory
921:M 14 Jan 00:14:13.056 * 10000 changes in 60 seconds. Saving...
921:M 14 Jan 00:14:13.116 * Background saving started by pid 14111
14111:C 14 Jan 00:14:55.017 * DB saved on disk
14111:C 14 Jan 00:14:55.064 * RDB: 3 MB of memory used by copy-on-write
921:M 14 Jan 00:14:55.146 * Background saving terminated with success

This makes the ingest code crash with the following:

Traceback (most recent call last):
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/celery/worker/worker.py", line 205, in start
    self.blueprint.start(self)
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/celery/bootsteps.py", line 369, in start
    return self.obj.start()
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 317, in start
    blueprint.start(self)
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 593, in start
    c.loop(*c.loop_args())
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/celery/worker/loops.py", line 91, in asynloop
    next(loop)
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/kombu/asynchronous/hub.py", line 299, in create_loop
    item()
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/vine/promises.py", line 170, in __call__
    return self.throw()
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/vine/promises.py", line 167, in __call__
    retval = fun(*final_args, **final_kwargs)
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/kombu/message.py", line 130, in ack_log_error
    self.ack(multiple=multiple)
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/kombu/message.py", line 125, in ack
    self.channel.basic_ack(self.delivery_tag, multiple=multiple)
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 664, in basic_ack
    self.qos.ack(delivery_tag)
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/kombu/transport/redis.py", line 170, in ack
    self._remove_from_indices(delivery_tag).execute()
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/redis/client.py", line 2879, in execute
    return execute(conn, stack, raise_on_error)
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/redis/client.py", line 2777, in _execute_transaction
    raise errors[0][1]
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/redis/client.py", line 2764, in _execute_transaction
    self.parse_response(connection, '_')
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/redis/client.py", line 2838, in parse_response
    self, connection, command_name, **options)
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/redis/client.py", line 680, in parse_response
    response = connection.read_response()
  File "/opt/rodeos-ingest-env/lib/python3.6/site-packages/redis/connection.py", line 629, in read_response
    raise response
redis.exceptions.ResponseError: Command # 1 (ZREM unacked_index 00725bdc-b0a8-4091-b11b-6197c59a1083) of pipeline caused error: MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
trel commented 3 years ago

I don't think this is a memory leak - I think this is the default behavior of redis (maxmemory 0, which means unlimited).

https://redis.io/topics/lru-cache

I have had success with the following two settings...

If the automated ingest tool gets a cache-miss due to prior key eviction, then it will check with iRODS (and populate the cache again) - which is still okay, just not as efficient as a cache-hit.

Increasing the memory of the machine will just give you access to a bigger cache.

holtgrewe commented 3 years ago

Thanks for the heads-up. Maybe this could go somewhere in a "note well" section of the icai README?

trel commented 3 years ago

yes, it's always a balance to document other projects :)

holtgrewe commented 3 years ago

I agree and I'm also facing expert's dilemma regularly.

Thanks for adressing this. The ordinary icai user might be an expert on iRODS and Python and even celery but might not have worked with redis. I'm using redis in caching for Django applications regularly, for example, but have never faced the mentioned problem there.

trel commented 3 years ago

For posterity - the largest installation I've used this for to date...

alanking commented 1 month ago

In addition to whatever cool tricks @trel is going to document, we can also just link to the Redis documentation: https://redis.io/docs/latest/operate/oss_and_stack/management/admin/

There is a section called "Memory" which gives a few pointers which I will paste here just in case it goes away:

Memory

  • Ensured that swap is enabled and that your swap file size is equal to amount of memory on your system. If Linux does not have swap set up, and your Redis instance accidentally consumes too much memory, Redis can crash when it is out of memory, or the Linux kernel OOM killer can kill the Redis process. When swapping is enabled, you can detect latency spikes and act on them.
  • Set an explicit maxmemory option limit in your instance to make sure that it will report errors instead of failing when the system memory limit is near to be reached. Note that maxmemory should be set by calculating the overhead for Redis, other than data, and the fragmentation overhead. So if you think you have 10 GB of free memory, set it to 8 or 9.
  • If you are using Redis in a write-heavy application, while saving an RDB file on disk or rewriting the AOF log, Redis can use up to 2 times the memory normally used. The additional memory used is proportional to the number of memory pages modified by writes during the saving process, so it is often proportional to the number of keys (or aggregate types items) touched during this time. Make sure to size your memory accordingly.
  • See the LATENCY DOCTOR and MEMORY DOCTOR commands to assist in troubleshooting.

There's also a section in the README that we've added ~in the semi-recent(ish??) past~: https://github.com/irods/irods_capability_automated_ingest?tab=readme-ov-file#starting-redis-server (See also: https://github.com/irods/irods_capability_automated_ingest/issues/78)

alanking commented 1 month ago

Added a section for Redis configuration which links to the official documentation for memory management.