jetperch / pyjoulescope_ui

Joulescope graphical user interface
https://www.joulescope.com
Apache License 2.0
87 stars 26 forks source link

UI freezes after collecting large amounts of data #281

Closed FerriteGiant closed 1 week ago

FerriteGiant commented 3 months ago

Joulescope model

JS110

UI version

other

What OS are you seeing the problem on?

Windows 10

What happened?

I'm running the newest UI version (v1.1.10). I've been trying to run overnight tests and now two nights in a row when I come back in the morning the UI is completely unresponsive. My only option is to kill it and open a new instance.

Sample rate was set to 20 kHz with a ~20 GB buffer, which according to the UI should mean I can record up to ~22 hrs of data. Though even if it runs out of space I would expect it to just drop old samples and not freeze.

In this file you can see I stopped interacting with the UI at 2024-07-22 20:07:28, then the next line seems to be when I closed it at 2024-07-23 13:59:12. (Closing required right clicking on the taskbar icon and clicking "close", the "X" button on the UI didn't respond.)

joulescope_20240723_025928_34488.log

What was expected?

UI should not crash

How to reproduce?

  1. Start signal sample streaming
  2. Wait some number of hours more than a couple and less than 17
  3. Observe unresponsive UI

Extra information

Version Information:

Item | Version -- | -- UI | 1.1.10 driver | 1.5.3 JLS | 0.9.5 Python | 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] Platform | Windows-10-10.0.19045-SP0 Processor | AMD64 Family 23 Model 49 Stepping 0, AuthenticAMD
mliberty1 commented 3 months ago

I can say we have not tested this configuration of 20 kHz sampling rate and 20 GB buffer. It should work, and we will set up a test to see if we can duplicate this. As you already observed, the log file does not give any hints as to what went wrong.

In general, here are some recommendations for long-term data collection:

  1. Ensure that your computer power management is disabled. Modern OSes aggressively throttle back USB and processing when they think you are not looking.
  2. Disable all updates (Windows, vendor such as Dell, antivirus), backups, & scans. I use wired network, and I physically unplug ethernet from the computer when performing long-term captures.
  3. Switch to the Multimeter view (no Waveform widget). The Waveform widget requires a fair amount of processing that can exacerbate other issues.
  4. Record to JLS rather than increasing the RAM buffer size. Windows has no great way to say keep this buffer in RAM. If it decides to swap it to disk, then UI performance nose dives, which could explain this behavior. Also, the RAM buffer goes away if the application crashes, the computer crashes, the computer resets, etc. The JLS file persists.
FerriteGiant commented 3 months ago

All good suggestions. I forgot to note that I have successfully run similar setups with buffers of 12-20 GB quite a number of times on previous UI versions. Probably the last time I did was sometime last fall or last summer. (I may be a bit spoiled at work by having 128 GB of ram 😆 )

mliberty1 commented 3 months ago

Well, I ran the Joulescope UI overnight on a Rizen 7 5800U machine with 32 GB RAM [Amazon]. The UI was still very responsive this morning. The configuration was 20 kHz and 20 GB RAM buffer. However, I used a JS220, since that was what was already connected.

I just started a new test with a JS110 now...

FerriteGiant commented 3 months ago

Interesting. I can test it again tonight and I can borrow a JS220 from a coworker to try as well.

FerriteGiant commented 3 months ago

This time after 19 hours and about 15 GB it's still working fine. The computer did go through windows updates and some reboots since the runs where it froze. So maybe just chalk it up to windows being dumb.

mliberty1 commented 3 months ago

I just checked on my Rizen 7 5800U computer, and it is still running with a JS110. It's been 24+ hours. I will keep it running over the weekend and see what happens.

Long-term stability is definitely a challenge on Windows. Let's keep this open and I'll continue to see if I come across anything. Likewise, post if you see this again, especially if the logs show something!

mliberty1 commented 1 week ago

While we do get occassional reports of the UI not performing on Windows, most are with version 1.0.x. Version 1.1.11, 1.1.12 and especially the soon-to-be-released 1.2.0 picked up many Qt / PySide6 / PySide6-QtAds fixes that could explain some memory & performance problems.

I am going to close this issue. We continue to monitor performance and stability. If you see any issues, feel free to reopen this or create a new one.