sci-visus / OpenVisus

Open Source distribution of the ViSUS capabilities
Other
27 stars 18 forks source link

Lock File Triggers an Infinite Loop #241

Open klacansky opened 1 year ago

klacansky commented 1 year ago

A running query can leave lock file if it has been interrupted, such as restarting notebook kernel. Currently, the code keeps printing in the log that the lock file exists forever. We need a better way to inform the user about the problem. I suggest we either throw an exception with clear error message or, as Valerio suggested, delete the lock file and corresponding block file and try again (may be problematic if two processes access the file).

scrgiorgio commented 1 year ago

Good point. The code is retrying for some minutes (if it's endless waiting, then it is wrong behavior). Could you check and propose a pull request?

klacansky commented 1 year ago

I looked through the code and did not find any check to stop after certain time or number of tries. I create a pull request, and I am not sure it breaks any workflows.

scrgiorgio commented 1 year ago

what is the right behavior? raise an Exception or return false?

klacansky commented 1 year ago

Looked into raising an exception. The issue is the requests run on threads, and we need a mechanism to report a thread failure to the main thread. Similar problem can happen if unlock throws an exception.

I tried to modify MultiplexAccess.cpp to catch exception in the runInBackground function and store it in a shared pointer. However, the main thread does not check this shared pointer so I can't easily rethrow the exception. I tried to rethrow the exception in the destructor, and is is too late and the whole thing spins in an infinite loop. There are also multiple places where this can happen.

What do you think is the right approach? You have a better understanding of the whole system. Personally, I would prefer to have two queues, one for TODO work and one for results + error codes. It is challenging to understand the async code.