labscript-suite / labscript

The 𝗹𝗮𝗯𝘀𝗰𝗿𝗶𝗽𝘁 library provides a translation from expressive Python code to low-level hardware instructions.
http://labscriptsuite.org
BSD 2-Clause "Simplified" License
9 stars 48 forks source link

Memory Leak of Non-Static Output Classes #76

Closed zakv closed 3 years ago

zakv commented 3 years ago

The memory usage of runmanager's batch_compiler process grows continuously as more shots are compiled; it's hit ~10 GB of ram on my machine after compiling ~10,000 shots. I edited batch_compiler.py to use labscript_utils.memprof.MemoryProfiler and save its results right after a call to gc.collect(). The first several lines of the profiler's results are below:

                                              <class 'dict'>  2654089
                                              <class 'list'>  1975935
                                              <class 'cell'>  1006292
                                          <class 'function'>   504115
                                             <class 'tuple'>   503151
                     <class 'labscript.labscript.AnalogOut'>   120960
                    <class 'labscript.labscript.DigitalOut'>   114240
                       <class 'labscript.labscript.Shutter'>    40320
                     <class 'labscript.labscript.ClockLine'>    33600
<class 'labscript_devices.NI_DAQmx.models.NI_PCI_6713.NI_PCI_6713'>    20160
                              <class 'sip.methoddescriptor'>    13975
                                            <class 'method'>    13379
                   <class 'labscript.labscript.Pseudoclock'>     6720
 <class 'labscript_devices.PulseBlasterUSB.PulseBlasterUSB'>     6720
<class 'labscript_devices.PulseBlaster.PulseBlasterDirectOutputs'>     6720
<class 'labscript_devices.NI_DAQmx.models.NI_PCI_DIO_32HS.NI_PCI_DIO_32HS'>     6720
                       <class 'labscript.labscript.Trigger'>     6720
<class 'user_devices.RbLab.PCOCamera.labscript_devices.PCOCamera'>     6720
                                           <class 'weakref'>      450

It seems that some objects labscript objects aren't getting garbage collected after a shot's compilation is finished, as evidenced by the fact that there are 6720 instances of devices which we only have one of. Interestingly it seems that this doesn't occur for devices which only have static outputs. We have a custom static DDS device class for our Agilent 83650B and there is only one instance of its class according to MemoryProfiler's output:

<class 'user_devices.RbLab.agilent_83650b.labscript_devices.Agilent83650B'>        1
<class 'user_devices.RbLab.agilent_83650b.labscript_devices.Agilent83650BOutput'>        1

I've been playing around with gc.get_objects() and gc.get_referrers() trying to find what's holding on to a reference to those objects but haven't been able to figure it out yet. Any suggestions of where to look would be appreciated! Also, it is possible that this issue isn't a problem with the labscript module itself, in which case sorry for posting it in the wrong repo!

zakv commented 3 years ago

Well this is embarrassing. It turns out that the memory leak was caused by one of my modules, not labscript.

We have a module that uses a dictionary with instances of AnalogOut as keys, which we use to implement a "ramp from previous value" function. That dictionary would hold on to references to instances of AnalogOut from every shot compilation. Those AnalogOut instances had direct/indirect references to a bunch of other things from the connection table, which is why none of that stuff was getting garbage collected. Updating the module to delete references to old AnalogOut instances resolved the issue. I suppose maybe I should have just used import_or_reload() from the beginning.

Anyway, closing this issue since this wasn't actually a problem with labscript.

chrisjbillington commented 3 years ago

Glad you worked it out and that it didn't indicate anything super wrong with labscript itself!

"ramp from previous value" is an oft-requested feature, I'm not sure if you've discussed it here previously but if not, I'd be interested in how you've gone about implementing it. I've thought about it and it seems tricky to get it right in the general case, though hacking it on for specific kinds of ramps would be easy enough.

zakv commented 3 years ago

I don't recall being part of any ramp-from-previous-value-feature conversations but I can definitely believe that it's a popularly requested feature.

Our implementation isn't great. We have a sequence_utils.py module which has a module-level global _previous_values dict and the functions ramp_from_previous_value() and jump_from_previous_value(). Those functions are just thin wrappers around AnalogQuantity.ramp() and AnalogQuantity.constant() which then also record their last value in _previous_values(). For example, calling ramp_from_previous_value(some_output, start_time, duration, final_value, ...) would first create a ramp starting at whatever value is stored in _previous_values[some_output], then update that channel's entry with _previous_values[some_output] = final_value. I didn't bother wrapping all of the output waveform methods so we also have get_previous_value() and set_previous_value() methods which can be called manually before/after using other waveforms like AnalogQuantity.exp_ramp_t().

The biggest problem with that approach is that it assumes that the instructions are created in the labscript in the same order in which they'll actually run in the sequence. In other words the line in the labscript that calls ramp_from_previous_value() at t = 1 second better be before the line that calls ramp_from_previous_value() at t = 2 seconds, otherwise the output isn't going to do what you were hoping for.

I took a brief look through the mailing list and the open issues in this repo and didn't see any discussion about this possible feature, though admittedly I didn't look too hard. I would have thought that the most straightforward way to implement a previous value feature would be to accept a special value for the initial argument of the output methods. That could be None or some special constant imported from labscript. Then during compilation labscript would sort the instructions for a given output by their start time (maybe it does that already) and keep track of the final value from each instruction as it goes along. It would then use the most recent final value stored as the initial value for any ramps where the initial value is set to None or whatever the special value is.

I suspect that I'm oversimplifying things though given that you've put some thought into implementing a feature like this and think it would be tricky. The approach I'm suggesting also wouldn't make it possible to do something like "ramp from the previous value to twice the previous value", though I don't think there's a great way to support something like that. It would probably require using futures which probably wouldn't be particularly easy to implement, or even easy to use once implemented.