inveniosoftware / invenio-indexer

Record indexer for Invenio.
https://invenio-indexer.readthedocs.io
MIT License
1 stars 39 forks source link

More robust bulk indexing #111

Open rerowep opened 5 years ago

rerowep commented 5 years ago

The function _bulk_op handels probems with RabbitMQ not very well. For example if we have memory restrictions for RabbitMQ and we get very low on memory this function will fail without log messages or waiting for the RabbitMQ to response again.

See following error message from RabbitMQ:

memory resource limit alarm set on node 'rabbit@mef'.
**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************
lnielsen commented 5 years ago

Above log is from RabbitMQ (if I understand correctly). Do you have an example of what the code does? I would think that it would either 1) hang or 2) raise an exception if it cannot publish to rabbitmq. In case of 2 that should result in an likely a 500 error that you can log to e.g Sentry. In case of two, there ought to be a timeout (even if it is long) that would result in an exception being raise.

First, this is just to understand exactly what is the problem. Then we should of course see how we can imprve it.

rerowep commented 5 years ago

Yes the log is from RabbitMQ. To reproduce the error you should maybe limit the docker memory for RabbitMQ and then try to bulk index a lot of records with the option --delayed. I already have following function:

def bulk_index(uuids, process=False):
    """Bulk index records."""
    indexer = RecordIndexer()
    retry = True
    minutes = 1
    while retry:
        try:
            indexer.bulk_index(uuids)
            retry = False
        except Exception as exc:
            sleep(minutes * 60)
            retry = True
            minutes *= 2
    if process:
        indexer.process_bulk_queue()

but I prefer this functionality in the invenio-indexer because we will be sure the cli pipenv run invenio index run ... will index all records.

lnielsen commented 5 years ago

You don't have an exception stacktrace from the Python side of things? Sorry, only seeing your message now?