Decode fails on final transaction in sequence

adamgreig commented 4 years ago

Hi! I'm not sure if this is the appropriate place to report issues with this analyzer, so please redirect me if not.

When decoding an SWD transaction, the current analyzer needs to see a '1' bit after the current transaction, to mark it as a valid operation. Specifically for each bit parsed the worker thread calls IsOperation which runs through all trailing zeros until it sees a 1 bit.

However, the final transaction in a sequence won't have any subsequent 1 bit, because no more bits are clocked out at all. As a result, the final transaction doesn't get analyzed at all. This is especially annoying when analysing a failure situation because the final transaction is the one you're most interested in!

I prepared an example capture, the first is a normal SWD transaction where the analyzer doesn't process the final transaction, the second I artificially add some extra 1 bits at the end which triggers the analysis.

Here's a zip of the two traces:

swd_analyzer_trailing_1s.zip

I don't understand how the loop in https://github.com/saleae/swd-analyzer/blob/master/src/SWDTypes.cpp#L407 ever terminates but perhaps a fix could be as simple as having the loop break if there are no bits left to analyser as well as if we see a 1.

frankleonrose commented 3 years ago

Hi, @timreyes - I've been looking at this issue and the code.

One thing that's not clear to me is how the end of samples is handled generally. The (old) docs say that AdvanceToNextEdge will block waiting for the next sample in "real time" situation. But what about when sampling is done? It goes to the issue here - ParseBit should know when it hits the end of available data, but how?

timreyes commented 3 years ago

@frankleonrose I'll review this with the team (I've scheduled something this Wednesday). I don't quite remember off the top of my head. I'll need to spend some time on this and might need to run some tests as a refresher, but I'll keep you updated on an explanation!

Marcus10110 commented 3 years ago

@frankleonrose,

When an analyzer runs on an existing capture, instead of when recording, all the functions which normally block for new data won't block until theAnalyzerChannelData reaches the end of the capture. At that point, the blocking functions will never return. Unfortunately there is no API function to indicate that the end of the capture has been reached, this was originally designed like this to ensure that all protocol decoders could run on streaming data and not rely on the capture being finished to decode. Unfortunately, this has the side effect of frequently making it difficult to process the last frame in a capture, because the decoder might block before the last frame has been decoded.

This end of capture behavior is the same when running on both streaming data and on completed captures - the moment a blocking function attempts to pass the end of the AnalyzerChannelData object's data, the function will not return.

Marcus10110 commented 3 years ago

ParseBit calls mSWCLK->AdvanceToNextEdge(); twice, then mSWCLK->GetSampleOfNextEdge() once. This means that id the SWCLK signal does not have at least 3 more transitions available, then ParseBit will never return.

An alternative solution to the problem is to limit how far the AnalyzerChannelData object should looks for the next transition using WouldAdvancingCauseTransition. That way, the function will only block once the position of mSWCLK is extremely close to the end of the capture. However, if there is no practical lower limit on the clock rate, then the analyzer might not be able to distinguish between the end of a packet and just a really slow bit that was cut off. If the protocol has any spec for longest acceptable gap between words, that could use used as the upward limit too.

frankleonrose commented 3 years ago

@Marcus10110 - Thanks very much for the detailed explanation.

Maybe a way around it is to use the rerun feature - first pass the analyzer figures out how much data there is and the second pass it knows how close to the end it is at all times. Is there any way to tell whether you're working with live vs finite AnalyzerChannelData?

Just curious - when the function just doesn't return - how do you do that? Summarily terminate the thread? longjump back up into the app code? Either way, as an extension writer, I can't have any resources held that expect to be freed in destructors, right?

Marcus10110 commented 3 years ago

@frankleonrose rerun is actually a holdover from the 1.x software. Presently none of our protocol analyzers use it, and support for it wasn't implemented in Logic 2. This actually fell of my radar until you mentioned it now, we need to get this documented.

As for how the function does not return, good question!

We call the analyzer's WorkerThread function from a new thread.

Your implementation of WorkerThread calls a function like AdvanceToNextEdge, Our implementation of AdvanceToNextEdge Checks to see if there are any more transitions in the existing data. If there are none, then the thread will sleep, waiting on a C++ condition variable that will notify once more data has been recorded. This is what blocks the thread until more data is available.

If no more data is ever recorded, the function stays blocked - until we decide to join the thread. There are an number of triggers for this, but the implementation is the same.

Basically, we set a boolean flag called "thread must exit". Then we notify the "more data available" condition variable. This unblocks the worker thread. In each of our functions like AdvanceToNextEdge, after we're done waiting for the "more data available" condition variable, we always check to see if the "thread must exit" flag has been set.

If it has, we throw a special exception, something like "thread must exit exception".

At the very top of the thread, when we called your implementation of WorkerThread, we did this inside of a try/catch statement that catches this specific type of exception. That causes the stack to unwind all the way back to that top point. After that the function returns and elsewhere the thread join completes.

timreyes commented 2 years ago

Another user reported the bug in ticket #69924

timreyes commented 10 months ago

Another report of the issue here: #85860

saleae / swd-analyzer

Decode fails on final transaction in sequence #1