ngscopeclient / scopehal

Test and measurement hardware abstraction library and protocol decodes. This is the library only. Most users should use scopehal-apps.
BSD 3-Clause "New" or "Revised" License
215 stars 96 forks source link

Rigol: Floating point exception #803

Open penguin42 opened 1 year ago

penguin42 commented 1 year ago

Connecting to a DS1102Z-E Rigol via LAN, clicked on the single shot button and got: Warning: Ran out of data after 0 points Warning: Attempted to bind an empty buffer Floating point exception (core dumped)

0 0x00000000008cc374 in WaveformArea::RasterizeAnalogOrDigitalWaveform(std::shared_ptr, vk::raii::CommandBuffer&, bool)

(this=0x33729f0, channel=std::shared_ptr<DisplayedChannel> (use count 3, weak count 0) = {...}, cmdbuf=..., clearPersistence=false)
at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformArea.cpp:1603

0 0x00000000008cc374 in WaveformArea::RasterizeAnalogOrDigitalWaveform (this=0x33729f0, channel=..., cmdbuf=..., clearPersistence=false)

at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformArea.cpp:1603

1603 int64_t offset_samples = (offset - data->m_triggerPhase) / data->m_timescale;

(gdb) p *data $2 = {_vptr.WaveformBase = 0x9af840 <vtable for UniformWaveform+16>, m_timescale = 0, m_startTimestamp = 16951 70282, m_startFemtoseconds = 23995876312255, m_triggerPhase = 0, m_flags = 0 '\000', m_revision = 0}

glscopeclient/scopehal-apps#1 0x00000000008cd42f in WaveformArea::RenderWaveformTextures(vk::raii::CommandBuffer&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >&, bool) (this=0x33729f0, cmdbuf=..., chans=std::vector of length 1, capacity 1 = {...}, clearPersistence=clearPersistence@entry=false) at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformArea.cpp:1545 glscopeclient/scopehal-apps#2 0x00000000008db504 in WaveformGroup::RenderWaveformTextures(vk::raii::CommandBuffer&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >&, bool) (this=, cmdbuf=..., channels=std::vector of length 1, capacity 1 = {...}, clearPersistence=clearPersistence@entry=false) at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformGroup.cpp:164 glscopeclient/scopehal-apps#3 0x0000000000801436 in MainWindow::RenderWaveformTextures(vk::raii::CommandBuffer&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >&) (this=, cmdbuf=, channels=) at /discs/more/git/scopehal-apps/src/ngscopeclient/MainWindow.cpp:434 glscopeclient/scopehal-apps#4 0x00000000008831cc in Session::RenderWaveformTextures(vk::raii::CommandBuffer&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >&) (this=, cmdbuf=, channels=) at /discs/more/git/scopehal-apps/src/ngscopeclient/Session.cpp:2166 glscopeclient/scopehal-apps#5 0x00000000008e7edc in RenderAllWaveforms(vk::raii::CommandBuffer&, Session, std::shared_ptr) (cmdbuf=..., session=0x3714770, queue=std::shared_ptr (use count 5, weak count 0) = {...}) at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformThread.cpp:167 glscopeclient/scopehal-apps#6 0x00000000008e8a34 in WaveformThread(Session, std::atomic) (session=0x3714770, shuttingDown=0x37148a0) at /discs/more/git/scopehal-apps/src/ngscopeclient/WaveformThread.cpp:144 glscopeclient/scopehal-apps#7 0x00007fb1950e31f3 in std::execute_native_thread_routine(void) (__p=0x3d778e0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:104 glscopeclient/scopehal-apps#8 0x00007fb194dcb897 in start_thread (arg=) at pthread_create.c:444 glscopeclient/scopehal-apps#9 0x00007fb194e5261c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

azonenberg commented 1 year ago

Looks like what's happening here is that the scope is doing something the driver doesn't like during acquisition (header_blocksize = 0 in RigolOscilloscope.cpp:848).

The driver fails in a rather unusual way (creates a waveform with zero points, rather than the expected way of sending "no data" which is to set the waveform pointer to null) which then causes a bunch of the GUI logic to choke.

Can you re-run with --debug --trace SCPISocketTransport (assuming you're using the LAN transport) on the command line and upload the output? Probably best to make it an attachment as the dump will likely be large.

penguin42 commented 1 year ago

Hmm, please find attached a log generated by:

./src/ngscopeclient/ngscopeclient --debug --trace SCPISocketTransport > log 2>&1

although it ends in: [SCPISocketTransport::ReadRawData] Got 250001 bytes [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:STAR 22250001 [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:STOP 22500000 [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:DATA? [SCPISocketTransport::ReadRawData] Got 2 bytes [SCPISocketTransport::ReadRaw

which makes me wonder whether it's complete.

I restarted ngscopeclient and did it again, but it failed much faster, and this time I left out the redirects, so here is where it ended on my terminal:

Connecting to SCPI device at ronnie:5555 [SCPISocketTransport::SendCommand] [ronnie] Sending IDN? [SCPISocketTransport::ReadReply] [ronnie] Got RIGOL TECHNOLOGIES,DS1102Z-E,DS1ZE231706403,00.06.02 [SCPISocketTransport::SendCommand] [ronnie] Sending :WAV:FORM BYTE [SCPISocketTransport::SendCommand] [ronnie] Sending :WAV:MODE RAW [SCPISocketTransport::SendCommand] [ronnie] Sending :TIM:VERN ON [SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN1:DISP? [SCPISocketTransport::ReadReply] [ronnie] Got 1 [SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN2:DISP? [SCPISocketTransport::ReadReply] [ronnie] Got 0 [SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN1:DISP ON [SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:MODE? [SCPISocketTransport::ReadReply] [ronnie] Got EDGE [SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:EDGE:SOUR? [SCPISocketTransport::ReadReply] [ronnie] Got CHAN1 [SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:EDGE:LEV? [SCPISocketTransport::ReadReply] [ronnie] Got 0.000000e+00 [SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:EDGE:SLOPE? [SCPISocketTransport::ReadReply] [ronnie] Got POS [SCPISocketTransport::SendCommand] [ronnie] Sending :TIM:MAIN:OFFS? [SCPISocketTransport::ReadReply] [ronnie] Got 0.0000000e+00 [SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN1:OFFS? [SCPISocketTransport::ReadReply] [ronnie] Got 0.000000e+00 [SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN1:RANGE? [SCPISocketTransport::ReadReply] [ronnie] Got 4.000000e-02 [SCPISocketTransport::SendCommand] [ronnie] Sending :SING [SCPISocketTransport::SendCommand] [ronnie] Sending WAI [SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:STAT? [SCPISocketTransport::ReadReply] [ronnie] Got STOP [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:SOUR CHAN1 [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:PRE? [SCPISocketTransport::ReadReply] [ronnie] Got 0,2,1200,1,5.000000e-11,-3.000000e-08,0,2.000000e-04,0,127 [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:STAR 1 [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:STOP 1200 [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:DATA? [SCPISocketTransport::ReadRawData] Got 2 bytes [SCPISocketTransport::ReadRawData] Got 9 bytes Warning: Ran out of data after 0 points Warning: Attempted to bind an empty buffer Segmentation fault (core dumped)

which makes me wonder if it left the scope in an odd state.

penguin42 commented 1 year ago

log.txt

penguin42 commented 1 year ago

I get the feeling there's something timing going on here; I broke out tcpdump and got a slightly different failure:


[SCPISocketTransport::SendCommand] [ronnie] Sending :CHAN1:RANGE?
[SCPISocketTransport::ReadReply] [ronnie] Got 4.000000e-02
[SCPISocketTransport::SendCommand]         [ronnie] Sending :SING
[SCPISocketTransport::SendCommand]         [ronnie] Sending *WAI
[SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:STAT?
[SCPISocketTransport::ReadReply] [ronnie] Got STOP
[SCPISocketTransport::SendCommand]     [ronnie] Sending WAV:SOUR CHAN1
[SCPISocketTransport::SendCommand]     [ronnie] Sending WAV:PRE?
[SCPISocketTransport::ReadReply]     [ronnie] Got 0,2,0,1,0.000000,0.000000,0,0.000000,0,0
Floating point exception (core dumped)
penguin42 commented 1 year ago

Oh, I think this is as simple that it hasn't actually triggered yet - there's no data; and depending on the timing it might be in the middle of the capture.

penguin42 commented 1 year ago

Manually triggering on the scope panel and then throwing ngscopeclient at it and it's working.

azonenberg commented 1 year ago

Very interesting, because it's supposed to be polling the trigger status and not trying to download data until it's actually triggered.

As a guess, this is a bug where it can't tell the difference between "trigger stopped because the scope just booted and has no data" and "trigger stopped because we armed the trigger and then it triggered".

azonenberg commented 1 year ago

Moving to scopehal as this is a driver bug, zero point waveforms aren't well defined so it's not surprising the GUI chokes if given one.

azonenberg commented 1 year ago

So it looks like there's logic in the driver (https://github.com/glscopeclient/scopehal/blob/master/scopehal/RigolOscilloscope.cpp#L705) to work around this problem.

Can you poke around here a bit and figure out if the check isn't triggering or something else is going on?

penguin42 commented 1 year ago

So it looks like there's logic in the driver (https://github.com/glscopeclient/scopehal/blob/master/scopehal/RigolOscilloscope.cpp#L705) to work around this problem.

Can you poke around here a bit and figure out if the check isn't triggering or something else is going on?

I think the problem is that (sometimes, timing) that the :TRIG:STAT? sent immediately after the *WAI hasn't yet got into the wait state;

[SCPISocketTransport::SendCommand] [ronnie] Sending :SING [SCPISocketTransport::SendCommand] [ronnie] Sending *WAI [SCPISocketTransport::SendCommand] [ronnie] Sending :TRIG:STAT? [SCPISocketTransport::ReadReply] [ronnie] Got STOP <<> PollTrigger, got: STOP PollTrigger, else case for: STOP Armed=1 protocol=1 Live=0 PollTrigger: Unsticky Rigol special [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:SOUR CHAN1 [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:PRE? [SCPISocketTransport::ReadReply] [ronnie] Got 0,2,0,1,0.000000,0.000000,0,0.000000,0,0 [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:SOUR CHAN2 [SCPISocketTransport::SendCommand] [ronnie] Sending WAV:PRE? [SCPISocketTransport::ReadReply] [ronnie] Got 0,2,0,1,0.000000,0.000000,0,0.000000,0,0 Warning: 0 timescale (Hack to avoid the divide 0 crash) Warning: 0 timescale

Typed on the SCPI console: [SCPISocketTransport::SendCommand] [ronnie] Sending TRIG:STAT? [SCPISocketTransport::ReadReply] [ronnie] Got WAIT

so when I got around to typing a TRIG:STAT? by hand in the SCPI console it had got to WAIT (I'd turned off the trigger input connected to the probes so it was just sitting waiting; perhaps it's blatantly ignoring your *WAI ?

azonenberg commented 1 year ago

Interesting, might be a firmware race condition.

How about adding a small time delay after the *WAI before sending TRIG:STAT? Does that make it work reliably? It'll hurt performance but we have hacks like this in the Siglent driver for similar reasons. (Even LeCroy has one or two annoying races but they only come into play for multi-scope setups)

penguin42 commented 1 year ago

I've got back to prodding this a bit.

a) I think actually in single-shot mode, as long as you had a trigger level set that actually did trigger it might have worked. b) But in repeated mode, it normally hits the problem I hit above.

c) I had a poke at the firmware, I'm not convinced the WAI really does anything, but I may be wrong. d) The CLS below seems mostly to fix the repeated mode, and the *OPC? adds a forced roundtrip that seems to make it more stable. e) I still occasionally see the 0 timescale warning I added; normally when I'm in a state when I didn't actually get a trigger.

I'm currently running with the following patches

commit 06e2b40f6dfb3389c15a2a7ceeb792014366bdd3
Author: Dr. David Alan Gilbert <dave@treblig.org>
Date:   Mon Nov 20 18:20:47 2023 +0000

    Rigol: Refactor triggering to share code

    Factor out the 3 copies of hte trigger setup into a helper.

diff --git a/scopehal/RigolOscilloscope.cpp b/scopehal/RigolOscilloscope.cpp
index fa51c15a..14954dde 100644
--- a/scopehal/RigolOscilloscope.cpp
+++ b/scopehal/RigolOscilloscope.cpp
@@ -714,6 +714,21 @@ Oscilloscope::TriggerMode RigolOscilloscope::PollTrigger()
        }
 }

+void RigolOscilloscope::SetTriggerHelper()
+{
+       if(m_protocol == DS_OLD)
+       {
+               m_transport->SendCommand(":TRIG:EDGE:SWE SING");
+               m_transport->SendCommand(":RUN");
+       }
+       else
+       {
+               m_transport->SendCommand(":SING");
+               m_transport->SendCommand("*WAI");
+       }
+       m_triggerArmed = true;
+}
+
 bool RigolOscilloscope::AcquireData()
 {
        //LogDebug("Acquiring data\n");
@@ -905,15 +920,8 @@ bool RigolOscilloscope::AcquireData()
                if(m_protocol == DS_OLD)
                {
                        m_transport->SendCommand(":STOP");
-                       m_transport->SendCommand(":TRIG:EDGE:SWE SING");
-                       m_transport->SendCommand(":RUN");
-               }
-               else
-               {
-                       m_transport->SendCommand(":SING");
-                       m_transport->SendCommand("*WAI");
                }
-               m_triggerArmed = true;
+               SetTriggerHelper();
        }

        //LogDebug("Acquisition done\n");
@@ -925,34 +933,14 @@ void RigolOscilloscope::Start()
 {
        //LogDebug("Start single trigger\n");
        lock_guard<recursive_mutex> lock(m_mutex);
-       if(m_protocol == DS_OLD)
-       {
-               m_transport->SendCommand(":TRIG:EDGE:SWE SING");
-               m_transport->SendCommand(":RUN");
-       }
-       else
-       {
-               m_transport->SendCommand(":SING");
-               m_transport->SendCommand("*WAI");
-       }
-       m_triggerArmed = true;
+       SetTriggerHelper();
        m_triggerOneShot = false;
 }

 void RigolOscilloscope::StartSingleTrigger()
 {
        lock_guard<recursive_mutex> lock(m_mutex);
-       if(m_protocol == DS_OLD)
-       {
-               m_transport->SendCommand(":TRIG:EDGE:SWE SING");
-               m_transport->SendCommand(":RUN");
-       }
-       else
-       {
-               m_transport->SendCommand(":SING");
-               m_transport->SendCommand("*WAI");
-       }
-       m_triggerArmed = true;
+       SetTriggerHelper();
        m_triggerOneShot = true;
 }

diff --git a/scopehal/RigolOscilloscope.h b/scopehal/RigolOscilloscope.h
index 2433f57e..54f778f2 100644
--- a/scopehal/RigolOscilloscope.h
+++ b/scopehal/RigolOscilloscope.h
@@ -138,6 +138,9 @@ protected:
 public:
        static std::string GetDriverNameInternal();
        OSCILLOSCOPE_INITPROC(RigolOscilloscope)
+
+private:
+       void SetTriggerHelper();
 };

 #endif
commit f774db9ad3659998780bfdf5025e6a1119c84023 (HEAD)
Author: Dr. David Alan Gilbert <dave@treblig.org>
Date:   Tue Nov 21 00:59:50 2023 +0000

    Rigol: Trigger timing

    On my DS1102Z-E, I often get triggering problems.
    The *CLS seems to help repeated triggers work.
    The *OPC? enforces a round trip that also seems to make
    it more solid.

diff --git a/scopehal/RigolOscilloscope.cpp b/scopehal/RigolOscilloscope.cpp
index 14954dde..d67674e2 100644
--- a/scopehal/RigolOscilloscope.cpp
+++ b/scopehal/RigolOscilloscope.cpp
@@ -723,8 +723,10 @@ void RigolOscilloscope::SetTriggerHelper()
        }
        else
        {
+               m_transport->SendCommand("*CLS");
                m_transport->SendCommand(":SING");
                m_transport->SendCommand("*WAI");
+               auto opcResponsea = m_transport->SendCommandImmediateWithReply("*OPC?");
        }
        m_triggerArmed = true;
 }
commit a6ae346be64d9f915b78df2e6c5deffa87cef9b6 (HEAD -> master)
Author: Dr. David Alan Gilbert <dave@treblig.org>
Date:   Mon Nov 20 16:34:58 2023 +0000

    WaveformArea: Survive empty capture

    Avoid a divide-by-0 crash on a broken capture with
    a 0 timescale.

diff --git a/src/ngscopeclient/WaveformArea.cpp b/src/ngscopeclient/WaveformArea.cpp
index b8105f1..2fc2132 100644
--- a/src/ngscopeclient/WaveformArea.cpp
+++ b/src/ngscopeclient/WaveformArea.cpp
@@ -1616,6 +1616,12 @@ void WaveformArea::RasterizeAnalogOrDigitalWaveform(
                channel->PrepareToRasterize(0, 0);
                return;
        }
+       if (data->m_timescale == 0.0)
+       {
+               LogWarning("0 timescale\n");
+               return;
+       }
+
        size_t w = m_width;
        size_t h = m_height;
        if(channel->GetStream().GetType() == Stream::STREAM_TYPE_DIGITAL)
azonenberg commented 1 year ago

I think this might be related to #817 ?

Also, please test with latest (10899c0f58e6111d8c8ea528a8a1ea0863bcb43c) which includes a bunch of refactoring and bugfixes in the Rigol driver plus converts it to use the new queued command API. If this does not fix the bug, please rebase your patches on that and let me know how it works.

azonenberg commented 1 year ago

Also, as a general rule we don't use the "normal" trigger mode in scopehal drivers because of the potential for race conditions (getting channel 1 from one trigger event and channel 2 from another). We normally run single trigger all the time, and "normal" trigger in ngscopeclient is simply a flag that tells the driver to automatically re-arm the trigger once AcquireData() completes.

penguin42 commented 1 year ago

Yeh #817 is probably similar; well on head it doesn't get the div/0 any more, but it's not happy - still lots of blank captures in the "normal" mode. Adding my *CLS helps that, but I am seeing something (in both modes?) which is I'm often only seeing the channel 2 capture with channel 1 blank; I don't think I've seen that before.

azonenberg commented 11 months ago

@penguin42 We fixed a bunch more bugs (https://github.com/ngscopeclient/scopehal/commit/2e74c6e30369b28afb05fa4df8c5e45f5ff2196e, https://github.com/ngscopeclient/scopehal/commit/0e26e3163f9c0bf2731e39e98cce0d7632213479, and a few others) that might also have been causing your issue.

Working any better w/ latest?

penguin42 commented 11 months ago

at 02b300df83bcd2ed9ed34c573dbedf4e9511121f main and 9564b05a2586b6ffeafcce3917225ce6735879cb it's better but still giving me some captures where nothing is showing in ch.1: bad-capture-no-ch1

(Now with matching capture, still doing it) Note, being a Rigol, the captures take a good few seconds each; it's always the Ch.1 that's AWOL.

penguin42 commented 11 months ago

Observation: This pair of channels is a USB pair; sometimes it captures a quite section, and sometimes it captures a burst of activity. Most (all?) of the cases where I'm missing ch.1 is when there's a burst of activity rather than the background tick

azonenberg commented 3 weeks ago

Is this still happening with the latest driver?