labscript-suite-temp-2 / blacs

BLACS, part of the labscript suite, provides an interface to hardware used to control a buffered experiment. It manages a queue of shots to be run as well as providing manual control over devices between shots.
Other
0 stars 0 forks source link

BLACS log file rollover not working #22

Closed philipstarkey closed 5 years ago

philipstarkey commented 7 years ago

Original report (archived issue) by David Meyer (Bitbucket: dihm, GitHub: dihm).


I know this used to work when first implemented a few months back, but now my log rollover is no longer working due to file locks. The error I get is:

#!python
Traceback (most recent call last):
    File "C:\Anaconda2\envs\labscript\lib\logging\handlers.py", line 77, in emit self.doRollover()
    File "C:\Anaconda2\envs\labscript\lib\logging\handlers.py", line 142, in emit self.doRollover()
    os.rename(self.baseFilename, dfn)
    WindowsError: [Error 32] The process cannot access the file because it is being used by another process
    Logged from file tab_base_classes.py, line 632 (or 636 or 727)

or

Logged from file __main__.py, line 584 or 603

or

Logged from file queue.py, line 828

depending on what BLACS is doing.

philipstarkey commented 7 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Unfortunately I think whatever we do here it always has a small chance of breaking due to the fact that we use the same log file from multiple processes, which is not supported by the logging module and obviously leaves us open to race conditions. What I'm guessing has happened is that one process is trying to do a rollover whilst another still has the file open. It may have worked in the past if you were merely fortunate enough for both processes to attempt rollover at almost the same time and hence close the file before either of them tried to delete it (though I'm not sure the child processes inherit the configuration - they might not be trying to roll over at all).

Ultimately we have to move to having some kind of logging server process that logging clients send their messages to, and which has sole dominion over the log files. The Python logging module documentation has some advice for how to do this, but in the interests of speed I wanted to use zmq for it so that any potentially blocking system calls (whether that's for writing to file or sending on a socket) occur in a different (non-GIL-requiring) thread. This was originally motivated by the fact that the zlock server sometimes hangs for a few seconds when writing to disk (which it only does for logging), just because the operating system is in the middle of a disk sync or something, or is otherwise preoccupied and blocks the file writing call. This makes zlock sometimes slow to respond to clients, causing the very occasional (every few months) takedown of the entirety of BLACS, which is pretty unacceptable. It also causes the much more frequent introduction of smaller delays on the order of milliseconds, which nonetheless add up since we make a lot of calls to zlock.

I have a half-finished implementation of a zmq logging server and client living in zprocess.logging, but it doesn't support rollover yet - I should probably change the server to leverage the standard library's logging module rather than writing to files manually, then I wouldn't have to reinvent the wheel there.

So that's the plan, and in the meantime I recommend you apply whichever workaround you prefer: disable logging, disable rollover and delete files manually, whatever is most convenient.

philipstarkey commented 6 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Issue #44 was marked as a duplicate of this issue.

philipstarkey commented 6 years ago

Original comment by Shaun Johnstone (Bitbucket: shjohnst, GitHub: shjohnst).


After recent updates in our lab, this issue seems to have become worse for us. It's hard to tell what change has caused it (possibly either the addition of the terminal output in tabs or some of the transitions to python 3 compatibility?) but now when BLACS has been open for a few hours it will either stop responding completely, or start behaving strangely (this morning for example, shots would seem to think that they were finished as soon as they started). The onset of this appears to coincide with the onset of the access error messages related to logging.

philipstarkey commented 6 years ago

Original comment by Shaun Johnstone (Bitbucket: shjohnst, GitHub: shjohnst).


philipstarkey commented 6 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Let's just disable log rolling until I finish the logging server in zprocess (I'll make this something of a priority).

philipstarkey commented 6 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Labscript utils pull request #50 disables log file rotation.

philipstarkey commented 6 years ago

Original comment by Shaun Johnstone (Bitbucket: shjohnst, GitHub: shjohnst).


Thanks, I should know by tomorrow morning if this was the cause of my issues. I suspect that it was, possibly due to each device continually printing errors in their terminal output windows (though I did not have these open).

philipstarkey commented 6 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


It's not clear to me why terminal windows specifically would exacerbate the problem, since there are two logging handlers - one to terminal and one to file - and it's the terminal ones that have changed behaviour with the terminals, not the file. There is still the same amount of logging to file. But, since the issue is a race condition, any random change may affect the timing of it, so it's plausible that messing with the timing of logging via the terminal changes may be causing the issue to come up more often. I hope you're right!

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Resolved by labscript_utils pull request #56

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).