huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
237 stars 79 forks source link

[IX] Duplicate segmentation db entries #6885

Closed LaszloKecskes closed 3 months ago

LaszloKecskes commented 3 months ago

The segmentation process that runs after uploading a pdf puts two segmentation entries in to the database. After it receives the results from the service, it updates only one of those entries, and one will have one correct entry and one stuck in 'processing' state. Under certain circumstances this makes the IX functionality unusable, as the system does not pick up the correct segmentation.

Manually deleting the faulty entries (leaving the one correct entry) is a work around. A migration is not necessary, as to 'processing' entries have a TTL of 24 hours.

This is put directly to sprint and worked on, since it blocks the priority IX development (relationships support), and is already present in staging.

LaszloKecskes commented 3 months ago

With the help of @gabriel-piles, we were unable to reproduce the error on staging. We have:

Locally, the error was caused by a faulty setup: accidentally, multiple tenants were pointing to the same database, and the process picked up the same new files twice for the two tenant entries.

Closing the issue without code changes, to be reopened if we see it again in staging.