opencitations / oc_ocdm

Object mapping library for manipulating RDF graphs that are compliant with the OpenCitations datamodel.
https://opencitations.net/
ISC License
3 stars 3 forks source link

filelock added in reader and storer #28

Closed arcangelo7 closed 2 years ago

arcangelo7 commented 2 years ago

I added filelocks to the Reader and Storer objects using your recommended py-filelock library. This change was crucial for opencitations/meta to operate quickly in multiprocess.

It works like a charm, but there is a problem: .lock files are not automatically deleted on UNIX, only on Windows. The reason is extensively discussed in this post.

I tried using multiprocessing.Lock() on Storer.store_all inside Meta to avoid modifying oc_ocdm, but in this way, I put a lock on the reading and writing operations of every single process and not on the single file, creating a severe bottleneck.

Temporarily, I’m deleting the .lock files at the end of the Meta process so that at least Meta is usable, but I understand that it is not an optimal solution. What do you think? Do you know how to delete .lock files on UNIX without creating race conditions?