Open zchen088 opened 8 years ago
Any basic diagnostic info from the OS?
The qubit server didn't appear to be hogging a lot of memory or CPU. It is one of the older computers we have.
I was just wondering whether the process was idle or spinning the CPU. I'm not familiar with scala debugging techniques but if you guys can do a bit of research maybe you'll find a debugger that can let you see where the program is when it hangs.
If you wan to look at the scala process, try Visual VM: https://visualvm.java.net/
However, I would guess that it's the DAC boards timing out. Are you running the latest version of the ghz fpga server? Does it log any timeouts when the slowdown happens?
Would restarting the qubit server would make the problem go away if it's board timeouts? I guess we'll find out.
Restarting the qubit server causes the boards to get pinged for their build numbers. It's possible (or at least conceivable) that this could get the boards working a bit longer. But I agree it's unlikely.
Doesn't appear to be a boards problem - I can bringup the boards while the data taking is hung. When I try the echo
setting on the qubit sequencer, it also hangs without any error messages. Anecdotally, it seems related to multiple people trying to take data at once.
The qubit server was on version 0.6.2, and I've now updated to 0.7.0. I now get errors likes this:
Error: (0) java.lang.OutOfMemoryError: Java heap space [payload=None]
o_O
Does that happen on startup or under some other condition?
More info: the qubit sequencer often fails when I switch to a new dataset folder/registry wrapper.
That's interesting. @maffoo does the qubit sequencer cache data related to each run's configuration? I'm surprised this would eat enough memory to matter either way though...
It does store the data, yes, because each experiment is a series of calls (initialize, upload SRAM, upload JT, etc). I think it should get cleared each time you re-initialize at the beginning of an experiment, but I suppose there might be a leak in there.
Also, this wouldn't really explain it because as far as the sequencer is concerned there's no difference between the first run of a new dataset and run n+1 where you just increment the delay time (for example). Uh, right?
On Tue, Jan 26, 2016 at 10:52 PM, Daniel Sank notifications@github.com wrote:
That's interesting. @maffoo https://github.com/maffoo does the qubit sequencer cache data related to each run's configuration? I'm surprised this would eat enough memory to matter either way though...
— Reply to this email directly or view it on GitHub https://github.com/martinisgroup/servers/issues/303#issuecomment-175375848 .
@DanielSank, @pomalley, yes, all data associated with a run of the qubit server is stored the context gets expired, or, more often, reinitialized for a new run. This is not a lot of data (on the scale of JVM memory use), and typically we only use ~10 contexts per user and just cycle through them, so this would not use up increasing amounts of memory unless people are bypassing the context management in pyle, or opening up lots of new pyle sessions and keeping them all open, or something like that.
I have VirtualVM running on hercules monitoring the qubit server, so if it hangs again we can hopefully get a better idea why that happened. So far there haven't been any hangs since starting VisualVM (watched pots and all that).
On Hercules (which is running the gmon experiment), every once in a while the qubit server appears to hang. The symptom is that when we run a scan, nothing happens - no error, no data saved to the datavault, the session just appears to be running the scan method indefinitely. We think it's a problem with the qubit server because when we restart it, scans seem to work again. I haven't seen this behavior on any other computers yet.