labscript-suite-temp-2 / lyse

lyse is an analysis framework. It coordinates the running of python analysis scripts on experiment data as it becomes availiable, updating plots in real time.
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Empty rows added to lyse dataframe, causing "Multiple items found" lookup error #13

Closed philipstarkey closed 9 years ago

philipstarkey commented 9 years ago

Original report (archived issue) by Shaun Johnstone (Bitbucket: shjohnst, GitHub: shjohnst).

The original report had attachments: lyse_issue13.log


Not 100% sure about the actions that lead to this, but I enabled a new multishot analysis routine and hit "run multishot analysis" at about the same time as a new shot came in.

There are 2 new rows at the bottom of the list of shots, each with the row number instead of the file path in the left column.

Here's the traceback:

#!python

Traceback (most recent call last):
  File "C:\Anaconda\lib\threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "C:\labscript_suite\lyse\__main__.py", line 1503, in incoming_buffer_loop
    self.shots_model.add_files(filepaths, new_row_data)
  File "C:\Anaconda\lib\site-packages\qtutils\invoke_in_main.py", line 105, in f
    return inmain(fn, *args, **kwargs)
  File "C:\Anaconda\lib\site-packages\qtutils\invoke_in_main.py", line 68, in inmain
    return get_inmain_result(in_main_later(fn,False,*args,**kwargs))
  File "C:\Anaconda\lib\site-packages\qtutils\invoke_in_main.py", line 47, in event
    result = event.fn(*event.args, **event.kwargs)
  File "C:\labscript_suite\lyse\__main__.py", line 1323, in add_files
    self.update_row(filepath, dataframe_already_updated=True)
  File "C:\Anaconda\lib\site-packages\qtutils\invoke_in_main.py", line 105, in f
    return inmain(fn, *args, **kwargs)
  File "C:\Anaconda\lib\site-packages\qtutils\invoke_in_main.py", line 67, in inmain
    return fn(*args, **kwargs)
  File "C:\labscript_suite\lyse\__main__.py", line 1249, in update_row
    model_row_number = self.get_model_row_by_filepath(filepath)
  File "C:\labscript_suite\lyse\__main__.py", line 1131, in get_model_row_by_filepath
    raise LookupError('Multiple items found')
LookupError: Multiple items found
philipstarkey commented 9 years ago

Original comment by Shaun Johnstone (Bitbucket: shjohnst, GitHub: shjohnst).


Also, while lyse was in this error state, BLACS still thought that it was successfully adding more shots, rather than holding off until lyse was fixed

philipstarkey commented 9 years ago

Original comment by Philip Starkey (Bitbucket: pstarkey, GitHub: philipstarkey).


Chris, is there a lock in lyse that prevents simultaneous access to the dataframe from multiple threads (not necessarily a dataframe specific one, but is access at least serialised through inmain or something similar)?

Pandas dataframes are not thread safe, so it is possible a race condition led to this behaviour.

philipstarkey commented 9 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


All modifications to the dataframe are done in the main thread via inmain() and friends. Most reading of the dataframe is also done with inmain(), though the server sending the dataframe to analysis subprocesses is not done in the main thread. I should investigate whether pickling is atomic, like, if it holds the GIL and thus the dataframe can't be modified mid-pickle.

But that's not the issue we've seen today, since writing to the dataframe was f'd up at some point. I'm not seeing any obvious causes of this bug so far. I'm starting to wonder if lyse was sent malformed data.

philipstarkey commented 9 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Do you have a logfile from the crash?

philipstarkey commented 9 years ago

Original comment by Shaun Johnstone (Bitbucket: shjohnst, GitHub: shjohnst).


I have attached a subset of lyse.log from a few minutes before the bug through to a few minutes after. It looks as though BLACS told lyse to add the file twice. Unfortunately the BLACS log appears to only have the last half hour worth of data, so no longer contains anything from this event

philipstarkey commented 9 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Sweet, found the bug. Lyse checks for duplicates when adding shots, and ignores them, but when I added batch processing (to speed up adding shots), I forgot to check for duplicates within each batch of files.

BLACS submitted the same shot twice within a millisecond, so lyse went ahead and added it twice, not detecting that one was a dup'. But the adding didn't complete because the code for adding a shot hit upon the 'multiple items found' error that it gave a traceback about. So things were left in a dodgy state and the thread for adding files crashed.

I've made lyse check for duplicates within batches, and wrapped the 'incoming files' loop in a giant try: except: that raises the error so you can see it, but keeps running. This is to catch unexpected errors only, and just because the error is caught doesn't mean things are in a state where lyse can keep running. But it's still better for that thread not to crash out.

Pushed to monashkrb fork, could you test there and then accept the pull request after a few days of use?

philipstarkey commented 9 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


philipstarkey commented 9 years ago

Original comment by Shaun Johnstone (Bitbucket: shjohnst, GitHub: shjohnst).


Fixed bug #13 where lyse did not check for duplicate shots when they arrived at the same time.

Also wrapped incoming files loop in try: except: so it keeps running in case of similar problems in the future.

→ \<\<cset 75e1a5b814f6f87b4428ade960ccfbc1a494f2d9>>