Open Cartache opened 3 years ago
By importing some documents I got errors during the consume process. I tried to reimport the failed documents by opening each document in PDF-Xchange Editor and saving the new pdf with a different name in the same consume folder. This procedure triggers a double import/consume that ends with an error and a success.
Here is my configuration
Hi @Cartache - I'm having a similar issue, however, it occurs even when I just copy and paste any pdf into the consume folder. That also seems to create two import jobs, one of which fails and the other succeeds.
Hey there. Was just chatting about similar earlier today over in #910. I suspect that this is specific to docker. I have a pretty good guess at what's going on... basically, documents are copied to containers in "layers" and as such, the consumer is briefly seeing two copies of the document , and spawning two sets of tasks. (see here.) Perhaps I need to see if there's a way to alter behaviors for certain folders or mount points, or somehow get the consumer to wait for the copy to truly be done before it starts the intake process. Any ideas?
More info... might be going down the wrong path, but looks like we're supposed to use a docker-specific copy command (docker cp) to move things in and out of the container. I'll mess with this a bit and report back.
Ok. So this is exactly what's going on. One snag, however, is that the docker cp command doesn't appropriately handle permissions, so when I copy directly into the consume folder, it fails due to document permissions.
The fix (for now) is to use docker cp to copy to the container into somewhere that is NOT the consume folder. After that, you can use a conventional cp or mv command inside the container to get it into the consume folder. And voila, it works! Now I suppose I just need to create a consumer for the consumer... I'll sleep on it.
OK - I was able to resolve this issue by implementing PAPERLESS_CONSUMER_POLLING and setting the value to 5. It appears that this will disable inotify, which was detecting duplicates while the copy operation was happening. I don't know if 5 is a reasonable number or not, so I'd welcome anyone's input.
@JeremyMorel I was just able to try your solution and can confirm that this resolved the issue I was experiencing.
This appears to be causing me issues also and flooded with errors.
Is adding PAPERLESS_CONSUMER_POLLING=5
the only and recommended method?
I'm also facing this issue.
Now I suppose I just need to create a consumer for the consumer... I'll sleep on it.
@JeremyMorel I'm happy to help work on a solution if you've slept on this and have some ideas on a more correct path forward here :)
OK - I was able to resolve this issue by implementing PAPERLESS_CONSUMER_POLLING and setting the value to 5.
Any chance there's a management utility command (similar to document_importer
) or REST API for triggering the consumer on-demand? If not, maybe I could help with this? I could see a medium term-solution (for my use-case, at least) where I cp
the new documents into the consume directory then trigger the consumer.
That would be better than polling all day long when docs are only copied in the consume directory a few times a day.
During the import process of all my documents, I encountered some errors. In order, to "help" the import process I ended up using PDF-XChange Editor to save a modified version of the document directly in the Import folder.
This operation initiate (or seems to initiate, see screenshots) a double consume job which ends in an error "database locked".
It is important to note that the document is finally consumed even if there is an error.
To reproduce:
Relevant information