jetperch / pyjoulescope

Joulescope driver and utilities
https://www.joulescope.com
Apache License 2.0
37 stars 11 forks source link

Joulescope UI freezes #17

Closed git-dimiz closed 4 months ago

git-dimiz commented 3 years ago

Hi,

I was trying to perform a 2 week recording session with joulescope unfortunately the UI stoped working after 2 days.

Here is the log of the joulescope application.

❯ ./joulescope_launcher
*****************************
* Joulescope User Interface *
* UI Version = 0.9.2        *
* Driver Version = 0.9.1    *
*****************************

info = {
  "joulescope": {
    "ui_version": "0.9.2",
    "driver_version": "0.9.1"
  },
  "platform": {
    "name": "darwin",
    "python_version": "3.8.5 (default, Jul 21 2020, 10:48:26) \n[Clang 11.0.3 (clang-1103.0.32.62)]",
    "platform": "macOS-10.15.7-x86_64-i386-64bit",
    "processor": "i386",
    "executable": "/Applications/joulescope.app/Contents/MacOS/joulescope_launcher",
    "frozen": "/Applications/joulescope.app/Contents/MacOS",
    "paths": {
      "dirs": {
        "app_path": "/Users/meow/Library/Application Support/joulescope",
        "config": "/Users/meow/Library/Application Support/joulescope/config",
        "log": "/Users/meow/Library/Application Support/joulescope/log",
        "firmware": "/Users/meow/Library/Application Support/joulescope/firmware",
        "themes": "/Users/meow/Library/Application Support/joulescope/themes",
        "data": "/Users/meow/Documents/joulescope"
      },
      "files": {
        "config": "/Users/meow/Library/Application Support/joulescope/config/joulescope_config.json"
      }
    }
  }
}

=====
[...]
ERROR:2020-10-01 11:21:28,569:device_thread.py:170:joulescope.usb.device_thread:device thread hung: status - FORCE CLOSE
WARNING:2020-10-01 11:21:28,569:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,570:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,570:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,570:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,618:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,618:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,619:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,619:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,619:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,619:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,619:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,619:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,619:view.py:229:joulescope.stream_buffer:_stats_get no samples
WARNING:2020-10-01 11:21:28,620:view.py:229:joulescope.stream_buffer:_stats_get no samples
ERROR:2020-10-01 11:21:28,949:main.py:637:joulescope_ui.main:statusUpdateTimer failed - assume device error
Traceback (most recent call last):
  File "joulescope_ui/main.py", line 631, in on_statusUpdateTimer
  File "joulescope/driver.py", line 1052, in status
  File "joulescope/usb/device_thread.py", line 227, in status
  File "joulescope/usb/device_thread.py", line 176, in _post_block
OSError
WARNING:2020-10-01 11:21:29,264:device_thread.py:158:joulescope.usb.device_thread:response queue not empty
ERROR:2020-10-01 11:21:29,264:driver.py:670:joulescope.driver:USB stop failed
Traceback (most recent call last):
  File "joulescope_ui/main.py", line 631, in on_statusUpdateTimer
  File "joulescope/driver.py", line 1052, in status
  File "joulescope/usb/device_thread.py", line 227, in status
  File "joulescope/usb/device_thread.py", line 176, in _post_block
OSError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "joulescope/driver.py", line 668, in close
  File "joulescope/driver.py", line 881, in stop
  File "joulescope/usb/device_thread.py", line 224, in read_stream_stop
  File "joulescope/usb/device_thread.py", line 165, in _post_block
OSError: DeviceThread not running
ERROR:2020-10-01 11:21:49,664:device_thread.py:170:joulescope.usb.device_thread:device thread hung: control_transfer_in - FORCE CLOSE
WARNING:2020-10-01 11:21:49,666:driver.py:558:joulescope.driver:could not fetch info record
WARNING:2020-10-01 11:21:49,669:device_thread.py:158:joulescope.usb.device_thread:response queue not empty
WARNING:2020-10-01 11:21:49,669:driver.py:562:joulescope.driver:could not fetch active calibration
ERROR:2020-10-01 11:21:49,669:main.py:821:joulescope_ui.main:firmware update failed
Traceback (most recent call last):
  File "joulescope_ui/main.py", line 819, in _device_open
  File "joulescope_ui/main.py", line 1079, in _firmware_update_on_open
  File "joulescope/driver.py", line 578, in info
  File "joulescope/usb/device_thread.py", line 218, in control_transfer_in
  File "joulescope/usb/device_thread.py", line 165, in _post_block
OSError: DeviceThread not running

Environment: Python 3.8.1 Mac OS Catalina Version 10.15.7 Joulescope is connected through a USB-Hub to a iMac

mliberty1 commented 3 years ago

Hi @llllVIIII, and thank you for reporting this issue. Would you also post the very top of the log that shows the versions and info sections? Feel free to exclude the "paths" if it contains sensitive information.

Are you using the official distribution? The reason I ask is that Python 3.8.1 is not what is in the latest official distribution.

git-dimiz commented 3 years ago

I have now extend the log with your request.

I also have realized that there was

ERROR:2020-10-01 11:21:28,569:device_thread.py:170:joulescope.usb.device_thread:device thread hung: status - FORCE CLOSE

in the log so I also added that.

I wonder why joulescope says it has python version 3.8.5? Does the MacOs version ship its own python version?

Are you using the official distribution? The reason I ask is that Python 3.8.1 is not what is in the latest official distribution.

I use homebrew to install python3

mliberty1 commented 3 years ago

Thanks for the update @llllVIIII . Based upon the log, it looks like you are running the official distribution. Unfortunately, the log only really says that something bad happened, and it does not have enough information to troubleshoot the underlying cause. Anything interesting that happened before that failure? If you want to email the full log to support at jetperch.com, I am happy to take a look.

Yes, the distribution comes with Python and all other required dependencies. When running the distribution, you do not need to install homebrew, python3, or any other dependencies, but you can if you want.

While we do Windows testing on a variety of machines, we only test on a single MacBook Pro, currently running Mac OS Catalina Version 10.15.6. We have performed long-term UI testing for over a week with no problems on this machine. I just updated to 10.15.17, and we will then perform another long-term test.

I have a few things for you to try in the meantime:

  1. Ensure that your iMac is not going to sleep. Long-term Joulescope logging requires that your computer stays awake.
  2. Connect Joulescope directly to your iMac. While most USB hubs should be fine, too many USB hubs in the wild have problems.

Another option to consider is downsample_logging.py rather than the UI. We developed this script to enable long-term logging and recovery. We have customers that collected months of data across multiple devices with this script. This script allows for error recovery on any issues including power loss, computer reboots, and any Joulescope device resets. With the latest firmware update that you installed in 0.9.2, you can also use on-instrument statistics which dramatically reduces the USB bandwidth. The command-line option is "--source sensor".

git-dimiz commented 3 years ago

I think this issue has nothing to do with the MacOs version. I already had this issue for a long time. Only now I was annoyed enough to report this issue.

My guess is that it may be related to USB-Hub. I will try to run the same test connected directly to the iMac. Is it possible to increase the log level, to provide you with maybe more information?

mliberty1 commented 3 years ago

Ah, I did not understand that this was an old issue that you had seen repeatedly before. We do extensive testing to ensure that Joulescopes are stable and reliable, so you should not expect to see this type of error. Try without the hub and see if the error is the same. If not, please send the full log by email. The log level is sufficient. However, any meaningful information would be before the first line you sent:

ERROR:2020-10-01 11:21:28,569:device_thread.py:170:joulescope.usb.device_thread:device thread hung: status - FORCE CLOSE

Everything after is a consequence of that line.

git-dimiz commented 3 years ago

I will do another test run and will come back to you once its done. (May take 2-3 weeks if it succeeds)

git-dimiz commented 3 years ago

I still was unlucky :/. The UI again freezed after 4d and 19h. I've send the full log to the email you mentioned.

mliberty1 commented 3 years ago

Hi @llllVIIII, and thanks for sending along the log. I took a look, and the log contains lots of messages like:

WARNING:2020-10-02 15:50:46,683:view.py:229:joulescope.stream_buffer:_stats_get no samples

I will be doing more testing to be sure, but I suspect that your Mac is going to sleep mode on you. Can you check your "Energy Saver" settings:

energy_saver_settings

The end shows that computer and the Joulescope instrument stop communicating over USB. Did you happen to notice if both LEDs were still illuminated on your Joulescope? Your Joulescope has a status LED next to the USB connector and a sensor LED between IN+ and IN-, at least for the included banana jack front panel.

git-dimiz commented 3 years ago

Hi @mliberty1 here is my Energy Saver Settings

Screen Shot 2020-10-07 at 16 19 16

As you can see the "Prevent computer from sleeping" check box was set.

The Sensor LED was green. But how the USB connector LED was I unfortunately don't know :( .

Is there any way I can help you to track down the issue further?

mliberty1 commented 3 years ago

I have a totally different possibility that has affected a few other people. Does your target emit RF, or is your Joulescope located close to an RF transmitter, such as your computer? One potential cause for unusual behavior is RF noise pickup through the cables. Here’s something easy to try. Twist your IN cables together from the power supply to the Joulescope, and then twist your OUT cables together. This does two things. First, it minimizes the loop area, which will reduce RF noise pickup. Second, it will increase the inductance slightly which may also help.

If the cables pick up excessive RF signal, then it could couple into the Joulescope ground and power planes. USB’s small common mode differential range on D+/D- is particularly susceptible to this type of noise pickup.

I am also (re)starting some long-term testing on the Mac. Hopefully, with patience and time, we can duplicate what you are seeing.

git-dimiz commented 3 years ago

Yes the device under test is a BLE device. There is every 3 Min a BLE communication between the device and the iMac. The distance between the device under test and the Joulescope is 5cm. The distance to the iMac is 30cm.

mliberty1 commented 3 years ago

The wavelength at 2.4 GHz is 12 cm. I normally recommend keeping the device under test as close as possible to your Joulescope to minimize noise pickup and impedance, but that <1/2 wavelength separation means that your Joulescope is effectively part of the antenna. I am not sure if this is the problem, but it could be.

Could you try moving the device under test further away, at least 2 wavelengths. If you twist the OUT+ and OUT- wires together, noise pickup should not be too bad. Try for 30 to 50 cm if you can.

Thinking back, we did have this issue: https://forum.joulescope.com/t/frequent-crashes-in-0-6-8/169 and this one, too: https://forum.joulescope.com/t/joulescope-frozen-red-usb-text/273/5 I never found the root cause and the problems just "went away". It could be related.

git-dimiz commented 3 years ago

So I've tried yesterday another test run. And again it failed. The error is still the same. This time I can tell that the USB-LED is still green but the In-LED is turned off.

As you suggested I've twisted the cables and also increased the distance. The device was about 50cm away from the Joulescope.

mliberty1 commented 3 years ago

Thank you for the update @llllVIIII regarding the In-LED being off. I also understand that increasing the distance between your Joulescope and the device under test had no effect.

The IN-LED is controlled by our sensor-side microcontroller. In the existing firmware, it turns on when:

  1. The sensor is powered under the control of the host-side microcontroller.
  2. The sensor microcontroller starts streaming data.

I have a question. When your Joulescope gets into this state, does restarting the UI fix the problem? Do you also have to unplug your Joulescope from USB and reconnect it?

Thank you again for helping us troubleshoot this issue. I have not seen this issue before, and I apologize that it is occurring while you are trying to do your work.

git-dimiz commented 3 years ago

Hehe I'm glad to help. Yes restarting the UI fixes the problem. Then the Joulescope starts again working.

mliberty1 commented 3 years ago

Summary

My current hypothesis is that your Joulescope's sensor microcontroller reset (cause still unknown), and then does not resume communication with the host microcontroller (cause still unknown). We have some investigation to do, which will take some time.

I am happy to send you a replacement Joulescope instrument to see if it makes any difference. If you would like a replacement, please send your shipping address, email, and phone number to support at jetperch.com, and we will ship it out today.

Details

I took another look at the log given this additional detail, and I found line 23861 particularly interesting:

INFO:2020-10-06 17:04:17,273:main.py:1669:joulescope_ui.main:Parameter set sensor_power failed, value=on

On line 23790 at 17:03:00, your Joulescope stopped communicating with your computer. I still don't know why. However, the software has very robust recovery mechanisms for cases like this. It automatically disconnected from the device to start the recovery process.

On line 23837 at 17:03:10, your host found that the Joulescope was connected, and attempted to reconnect.

On line 23848 at 17:03:43 (long delay, strange), it talked to the device and got the firmware information. The interesting part is that it knows the sensor firmware version. If the host controller rebooted, then it should not yet know the sensor firmware version since the sensor should be powered down. Therefore, we know that the host controller stayed alive, no watchdog reset or brownout reset.

On line 23851 at 17:04:17 (long delay, again strange), your computer attempted to tell your Joulescope host controller to turn on the sensor. This command failed, which means that the Joulescope host microcontroller did not reply correctly. This tells me that either the sensor microcontroller is in a weird state or the sensor communication stack on the host microcontroller is in a weird state. Neither should be able to happen. The sensor microcontroller has a watchdog reset and brownout reset. The communication stack should recover from all these conditions and bad messages. However, my suspicion is that the sensor microcontroller reset (actual cause unknown), and something went wrong with the communication stack.

Next step: Intentionally trigger a sensor side microcontroller reset during different operating states and ensure that the system recovers. I know that we did this testing, but it was quite a while ago.

git-dimiz commented 3 years ago

I am happy to send you a replacement Joulescope instrument to see if it makes any difference. If you would like a replacement, please send your shipping address, email, and phone number to support at jetperch.com, and we will ship it out today.

Thanks for the offer but I think I will just keep the device. We have another device in our office and it showed the same behavior with my iMac.

I will now use the downsample_logging.py script to see if this maybe behaves differently. And if not I will just use this script on a Raspberry Pi maybe there I will get some more stable results.

git-dimiz commented 3 years ago

Just for completness sake here also the backtrace of running the downsample_logging.py:

2020-10-09T13:03:01 ERROR    endpoint halt 1: transfer callback with status 1
2020-10-09T13:03:01 WARNING  libusb_submit_transfer [control] => -4
2020-10-09T13:03:01 WARNING  device_removed
2020-10-09T13:03:01 WARNING  Device.stop() while attempting _stream_settings_send
2020-10-09T13:03:01 WARNING  Missing one Joulescope
2020-10-09T13:03:01 WARNING  libusb_claim_interface 0 failed [-5]
2020-10-09T13:03:01 ERROR    open failed: 16d0/0e88/001067
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/joulescope/usb/libusb/device.py", line 792, in open
    may_raise_ioerror(rv, 'libusb_claim_interface 0 failed')
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/joulescope/usb/libusb/device.py", line 732, in may_raise_ioerror
    raise IOError(s)
OSError: libusb_claim_interface 0 failed [-5]
2020-10-09T13:03:01 ERROR    While running command
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/joulescope/usb/device_thread.py", line 64, in _cmd_process
    rv = self._device.open(event_callback_fn)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/joulescope/usb/libusb/device.py", line 792, in open
    may_raise_ioerror(rv, 'libusb_claim_interface 0 failed')
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/joulescope/usb/libusb/device.py", line 732, in may_raise_ioerror
    raise IOError(s)
OSError: libusb_claim_interface 0 failed [-5]
2020-10-09T13:03:01 WARNING  usb control transfer failed 5
2020-10-09T13:03:01 WARNING  calibration_read transfer failed 5
2020-10-09T13:03:01 ERROR    while capturing data
Traceback (most recent call last):
  File "downsample_logging.py", line 369, in run
    closed_count = self._open_devices()
  File "downsample_logging.py", line 357, in _open_devices
    closed_device.open(all_device)
  File "downsample_logging.py", line 464, in open
    device.parameter_set('i_range', 'auto')
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/joulescope/driver.py", line 443, in parameter_set
    self._stream_settings_send()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/joulescope/driver.py", line 721, in _stream_settings_send
    _ioerror_on_bad_result(rv)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/joulescope/driver.py", line 48, in _ioerror_on_bad_result
    raise IOError('usb.Device %s' % (rv,))
OSError: usb.Device ControlTransferResponse(result=5)

And here is the log content create by the script:

20201008_090247.208839 PARAM : downsample=1
20201008_090247.208865 PARAM : frequency=10
20201008_090247.208891 PARAM : jls_sampling_frequency=None
20201008_090247.208913 PARAM : source=stream_buffer
20201008_090247.247105 DEVICES : Joulescope:001067
20201008_090247.247227 LOGGER : RUN
20201008_090247.250158 DEVICE : OPEN Joulescope:001067
20201008_090247.402533 DEVICE_INFO : {"type": "info", "ver": 1, "ctl": {"mfg": {"country": "USA", "location": "MD_01", "lot": "201942_00"}, "hw": {"rev": "H", "sn_mcu": "32043011801B31BA9BB4FBB51091005F", "sn_mfg": "001067"}, "fw": {"ver": "1.3.2"}, "fpga": {"ver": "0.2.0", "prod_id": "0x9314acf2"}}, "sensor": {"fw": {"ver": "1.3.2"}, "fpga": {"ver": "1.2.1"}}}
20201008_090247.831625 PARAM : columns=time,current,voltage,power,charge,energy,current_min,current_max
20201008_090247.831685 PARAM : units=s,A,V,W,C,J,A,A
20201009_110301.598586 SIGNAL : STOP Joulescope:001067
20201009_110301.641565 EVENT : 1 transfer callback with status 1
20201009_110301.641648 EVENT : DEVICE_CLOSE Joulescope:001067
20201009_110301.728450 DEVICE : OPEN Joulescope:001067
20201009_110301.892919 DEVICE_INFO : null
20201009_110301.998244 FAIL : usb.Device ControlTransferResponse(result=5)
20201009_110301.998315 LOGGER : CLOSE
20201009_110301.998353 LOGGER : DONE
mliberty1 commented 3 years ago

Interestingly, downsample_logging seems to have failed due to the same error but with a completely different error message.

  1. I wonder. If you provide the "--source sensor" option to downsample logging, do you still see a problem? With the "--source sensor" option, the Joulescope instrument computes the statistics without sending the streaming sample data to the host. This mode dramatically reduces the amount of host communication.

  2. You mentioned, "We have another device in our office, and it showed the same behavior with my iMac." Do you see this same issue with the other Joulescope on other computers, or is it only your iMac?

Testing progress

I spent some time intentionally injecting sensor microcontroller faults and host microcontroller faults (hardfaults and watchdog resets), and the system always recovered under both Mac and Windows. Since I tested this before, I am not surprised, but it was the easiest potential cause to test.

Unfortunately, given what I know so far about this issue, I suspect that it will take some time to duplicate the problem, identify the root cause, and fix it. I will set aside some time next week to investigate further.

git-dimiz commented 3 years ago

I wonder. If you provide the "--source sensor" option to downsample logging, do you still see a problem? With the "--source sensor" option, the Joulescope instrument computes the statistics without sending the streaming sample data to the host. This mode dramatically reduces the amount of host communication.

No I just have used "python3 pyjoulescope_examples/bin/downsample_logging.py -f 10" to start the tool.

You mentioned, "We have another device in our office, and it showed the same behavior with my iMac." Do you see this same issue with the other Joulescope on other computers, or is it only your iMac?

My colleague had also some issues. But I don't know if these where the same issue that I experience.

mliberty1 commented 3 years ago

Not much of an update, but my macBook Pro has been running since Oct 9 (10 days). We are still investigating...

mliberty1 commented 4 months ago

Closing as no longer relevant. We could not duplicate this 4 years ago. The UI and underlying driver have change drastically over that time.

If you continue to see problems, please use HelpReport Issue from within the latest 1.x Joulescope UI.