nLabs-nScope / nScope-releases

30 stars 4 forks source link

Occasional crashes -- OS X and Ubuntu (possibly firmware crash?) #4

Closed gruvin closed 3 years ago

gruvin commented 8 years ago

Howdy

Loving my new nScope! This thing is so cool! :-D

Here's a bug though ...

EDIT: The below conditions still appear to make the crashes more common. But it turns out not to be as, "simple". See my later comments, please. Thanks.

Version nScope API: v0.6 nScope Firmware: v0.6

When running all four channels, using A1/A2 and P1/2 as signal sources (though I doubt that matters) and when inputs are at the higher end of the available frequencies from same (see attached screenshot) ... nScope crashes (on my Mac OS X 10.10.5) when I change the trigger input channel -- apparently from any channel to any other.

This doesn't seem to happen with low frequency inputs. [Ed. Do we issue prizes for longest sentences? :p]

A screenshot of pre-crash conditions and OS Crash Report, attached. The fault is 100% predictable and replicable, on my system at least. The crash occurs every time under the given conditions and is thus not at all intermittent -- except that it doesn't seem to happen at lower input frequencies.

screen shot 2016-02-03 at 1 13 00 pm In this case, I clicked trigger channel 4.

nScope Crash Summary.txt

Let me know if there's anything else I can do to help. (I have coding skills in most languages -- self taught over some 30 years or so. II, I muddle by and get there eventually. hehe)

gruvin commented 8 years ago

I lied. The problem is not 100% replicable, after all. It was for the first ten or so tests, which crashed immediately upon clicking a new trigger channel radio. Now though, I'm finding I have to change trigger channels several times to produce the crash. So unfortunately, it's starting to look like some kind of race condition. D'oh! Hope it's not too hard to track down.

gruvin commented 8 years ago

The trigger relationship may be a red herring ...

I've now witnessed seemingly the same crash, when not making any changes to the trigger settings.

I believe this is the same bug/cause, because in all cases thus far it has been Thread 5 that crashes and the error reports from OS X look very similar -- if not identical -- to me anyway. I have attached the latest crash report, relating to this herein, specific comment/crash.

nScope Crash 2 - no trigger changes.txt

In this present case, I was merely adjusting CH1's V/div setting -- while CH1 and CH2 were displaying sine waves, each at mid range frequencies from, from sources A1 and A2. CH3 and 4 were switched off.

Directly previously, I had been clicking between CH1 and CH2 trigger selections, to see if I could solicit a crash, with only two channels in use -- in hopes of narrowing things down some more. Not crash was forthcoming though, after some 20+ clicks. Then, I started playing V/div settings, merely so I could move the trigger point further from centre. This here crash developed unexpectedly, when making a change to CH2's V/div slider.

Also in all cases thus far, the scope was actively running in continuous mode.

I'll keep trying to narrow it down. I haven't gone anywhere near the source code, yet. But I guess that need is looming. {gulp}

gruvin commented 8 years ago

Same again, this time using the exact same settings as the screenshot in my first post, no trigger channel changes but sliding the trigger level up a couple notches.

Now I'll try to solicit the crash with triggering turned off entirely ...

Yup. I can get crashing without trigger involvement. Seems to be just about any change that causes the sweep display to reset or something of that nature. In this case, I just starting clicking the Continuos radio control over and over (with all four channels active). After about 8 clicks, boom. Crash. Same Thread 5. Same looking report.

I think that's about as narrowed down as I'm going to be able to go. Hope it's enough.

Oh, actually ... I can run tests under Windows and Ubuntu Linux. So I'll do that next.

gruvin commented 8 years ago

So I'm now on Ubuntu Linux 14.04 Desktop ...

This is nScope v0.6 running on
Linux 3.16.0-60-generic x86_64.

nScope API: v0.6
nScope Firmware: v0.6

... running on a physical, Intel Atom DualCore 1.6GHz board (not a virtual machine on my Mac, as I may have done.)

This time, the nScope app crashes (clicking Continuous radio control repeatedly to cause it, as before on the Mac). But this time there is at least one, "lost communication" event, soon before the actual crash. See the caveat note, below ...

/var/log/syslog snippet -- beginning just before the nScope was plugged in, which in turn was before loading nScope software ...

Feb  3 18:10:03 atomic kernel: [38270.996039] usb 3-1: new full-speed USB device number 3 using uhci_hcd
Feb  3 18:10:03 atomic kernel: [38271.173823] usb 3-1: New USB device found, idVendor=04d8, idProduct=f3f6
Feb  3 18:10:03 atomic kernel: [38271.173833] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Feb  3 18:10:03 atomic kernel: [38271.173839] usb 3-1: Product: nScope
Feb  3 18:10:03 atomic kernel: [38271.173844] usb 3-1: Manufacturer: nLabs
Feb  3 18:10:04 atomic mtp-probe: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1d.1/usb3/3-1"
Feb  3 18:10:04 atomic mtp-probe: bus: 3, device: 3 was not an MTP device
...

Here is the unexpected disconnect ...

Feb  3 18:12:21 atomic kernel: [38408.900231] usb 3-1: USB disconnect, device number 3
Feb  3 18:12:22 atomic kernel: [38409.896069] usb 3-1: new full-speed USB device number 4 using uhci_hcd
Feb  3 18:12:22 atomic kernel: [38410.075952] usb 3-1: New USB device found, idVendor=04d8, idProduct=f3f6
Feb  3 18:12:22 atomic kernel: [38410.075965] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Feb  3 18:12:22 atomic kernel: [38410.075972] usb 3-1: Product: nScope
Feb  3 18:12:22 atomic kernel: [38410.075979] usb 3-1: Manufacturer: nLabs
Feb  3 18:12:22 atomic kernel: [38410.608387] hid-generic 0003:04D8:F3F6.0004: hiddev0,hidraw2: USB HID v1.11 Device [nLabs nScope] on usb-0000:00:1d.1-1/input0
Feb  3 18:12:22 atomic mtp-probe: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1d.1/usb3/3-1"
Feb  3 18:12:22 atomic mtp-probe: bus: 3, device: 4 was not an MTP device

The following is me un/replugging the nScope's USB lead, after the crash ...

Feb  3 18:15:05 atomic kernel: [38573.572188] usb 3-1: USB disconnect, device number 4
Feb  3 18:15:14 atomic kernel: [38581.996040] usb 3-1: new full-speed USB device number 5 using uhci_hcd
Feb  3 18:15:14 atomic kernel: [38582.176030] usb 3-1: New USB device found, idVendor=04d8, idProduct=f3f6
Feb  3 18:15:14 atomic kernel: [38582.176040] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Feb  3 18:15:14 atomic kernel: [38582.176046] usb 3-1: Product: nScope
Feb  3 18:15:14 atomic kernel: [38582.176051] usb 3-1: Manufacturer: nLabs
Feb  3 18:15:15 atomic kernel: [38582.708534] hid-generic 0003:04D8:F3F6.0005: hiddev0,hidraw2: USB HID v1.11 Device [nLabs nScope] on usb-0000:00:1d.1-1/input0
Feb  3 18:15:15 atomic mtp-probe: checking bus 3, device 5: "/sys/devices/pci0000:00/0000:00:1d.1/usb3/3-1"
Feb  3 18:15:15 atomic mtp-probe: bus: 3, device: 5 was not an MTP device
**END**

Here is the related screenshot for this herein crash event, under Ubuntu ...

screenshot from 2016-02-03 18 15 36

CAVEAT: Before the actual crash, nScope itself displayed a message in its graph area, to the effect of, "Lost communication. Waiting for reconnect ...". (This has not occurred in the Mac case.) A second or three later, things were automagically working again. I continued clicking the Continuous radio control for about 7 more clicks, before the crash. Thus, the crash may not be explicitly related to the the USB disconnect. -- though such an unexpected disconnect surely wouldn't help and probably indicates a firmware crash, guessing.

I'll return to the Mac and see if these USB disconnects occur there, too.

gruvin commented 8 years ago

Back on the Mac, I now note that the nScope, "Lost communication. Waiting for reconnect ..." message works as expected, as it does on Ubuntu, when manually un/replugging the USB cable. No crash results from doing that even with the same conditions as before, with all four channels running, etc.

I did get another crash for this purpose; however, there doesn't appear to be any USB connect/disconnect messages available in Apple's syslog for the nScope -- strange and annoying :-/ (It does appear as, "nScope" in the system report screens, though.)

So, I cannot (yet?) report on whether or not such an unexpected disconnect occurred around the crash or not.

I will not test under Windows, since I think it's pretty clear now that the fault lies at least in part in the nScope's firmware -- and is probably some hard to find race condition. Ouch.

Ergo, I guess I've really gone as far as I can, now. Just hope my long "book" above helps in some meaningful way. Good luck and thanks again for a great product.

davidjmeyer commented 8 years ago

Wow, this is an amazing writeup. If indeed the USB is disconnecting, then some firmware crash/reset is almost certainly happening.

One question, is a "crash" a program exit, or hang? I have been noticing some hangs in software that I cannot track down either. Sometimes they happen when pressing controls very quickly.

gruvin commented 8 years ago

Hi David

On the Mac, it's a full crash (3 second hang, before sudden program exit) with Apple's crash reporter being invoked.

On Ubuntu, the program just hangs, turns grey and stays that way. IE, there's no core dump.

Hope I wasn't too verbose. I like you to see the full debug process, rather than just my own conclusions. (It all took place over several hours at my end, of course.)

Bryan.

gruvin commented 3 years ago

Five year old issue. No idea if ever resolved. Closing to clear out old junk.