ledatelescope / bifrost

A stream processing framework for high-throughput applications.
BSD 3-Clause "New" or "Revised" License
66 stars 29 forks source link

Improve debuggability in case of a stale LockFile #232

Closed league closed 2 months ago

league commented 5 months ago

As noted in the comment on the LockFile constructor, it quietly busy-waits if a stale lock file is left behind after abnormal exit.

Should be relatively easy just to emit a message after a few seconds of busy waiting (Greg's suggestion).

In addition, if we store the PID (+hostname?) as suggested in the comment, then there is more information available for troubleshooting.

Issue pointed out as part of this ng-dp issue.

dentalfloss1 commented 4 months ago

I'm working on this a bit since it is affecting some of my code

dentalfloss1 commented 4 months ago

I should have looked at this more carefully, I see @league has already made some progress

league commented 4 months ago

@dentalfloss1 Thanks, it's okay. I saw something from you on Slack involving chrono, did you delete that branch?

I think my solution works and will help, but we can compare notes. The only question I have is whether this type of message on cerr would have been useful in figuring out that the "stuck" behavior is due to a leftover lock file. (We likely wouldn't have cerr redirected somewhere less visible, right?) I'll make a PR for visibility.

jaycedowell commented 4 months ago

Related: At yesterday's meeting I made a claim that making the map cache directory's location depend on an environment variable was hard if not impossible. That's not really true. I was looking at the map disk cache code and it calls fileutils::get_home_directory() which looks for the $HOME environment variable before trying getpwuid(getuid())->pw_dir. We could probably add a fileutils::get_bifrost_local_directory() function that does something similar where it looks for an environment variable before falling back to something get_home_directory()-based.