jetperch / pyjoulescope

Joulescope driver and utilities
https://www.joulescope.com
Apache License 2.0
37 stars 11 forks source link

Block size computation precedence question in winusb/device.py #13

Closed petertorelli closed 4 years ago

petertorelli commented 4 years ago

Small observation: is this line trying to round to the nearest number of 512B block? It is just adding zero due to precedence:

self._transfer_size = (block_size + 511 // 512)

mliberty1 commented 4 years ago

Oh man. Yes, that should be:

self._transfer_size = ((block_size + 511) // 512) * 512

It would be clearer if it was:

self._transfer_size = ((block_size + BULK_IN_LENGTH - 1) // BULK_IN_LENGTH) * BULK_IN_LENGTH

Fortunately, we currently pass in block_size as multiples of 512, so it is a latent bug that currently has no ill-effects.

Thanks for the bug report!

petertorelli commented 4 years ago

I hit that bug because I wrote a Win32 driver by converting winusb/device.py to C++ Win32 app and passed in different values. I'm now beating on the code to see where it breaks, and I have a few more /* ASKMATT */ comments. I'll batch them up into one issue rather than trickling them in...

mliberty1 commented 4 years ago

Sounds like you are making progress! Send the comments my way when you are ready, and I'll take a look.

petertorelli commented 4 years ago

Yeah, it works really well. I tried using auto-py-to-exe but had issues with process management on spawn, and size of the binary (30M!). Spawning a python interpreter worked perfectly, but I didn't want the user to depend on a python install. So I guess now I've re-written the driver twice, once for JavaScript (using node usb) and once for Win32. Here's my list o' questions...

It turns out two of my big issues I was writing about were corrected as I wrote the question, so these should be simple. I had overlooked your thread command queue (the thing you call in between calls to process() in DeviceThread::run()), and I had a problem with growing/shrinking the overlapped numpy buffer (ptr and b) when handling read-pipe and control-transfer calls to the kernel... /smacks forehead/.

Still a few things I wanted to verify with you:

  1. Blocking ControlTransfers and Races

ControlTransferAsync::control_transfer_out/_in both utilize a nifty object called RunUntilDone to make a sync out of an async call. The function prototype is changes based on invocation. If the user passes a callback function, the boolean status of _pend() is returned. If the user passes no callback (None, nullptr), then the ControlTransferResult object is returned. I moved the synchronous mode into its own function so that I know if I'm getting a bool or a ControlTransferResult object. But there's no thread safety in either case: this can lead to a race if there is a loop on process() and control transfers aren't issued in between process calls. (But maybe this is obvious in hindsight because why wouldn't someone use your code... oh, wait.. ;)

  1. stop_code overload

This value can be None, an enumeration, or DWORD GetLastError(). While it is unlikely that GetLastError() will collide with the cases you enumerate and cause a bad code path, it isn't entirely clear how stop_code is intended to be used in all cases. I've traced down the non-logging usage of it and I think I implemented it identically, but I'm not 100% since some of the fail cases are hard to force. Just looking for a little guidance here.

  1. ControlTrasnferAsync deferred close_event

Why are you deferring the CloseHandle even though you know the command hung? There's a comment about not being able to abort a control transfer in WinUSB, is this how it is handled? Just want to make sure I get this case right.

  1. WinUsbDevice Event Handle

I think WinUsbDevice doesn't need an event. A minor point, but I don't see this being used by any kernel calls.

  1. EndpointIn::process_signal()?

What is the purpose of EndpointIn::process_signal()? It looks like after a certain number of transfers it is called, like a heartbeat or ping?

I'm really impressed with your Endpoint buffer shuffling and control structure. I learned a lot from it.

Thanks, Peter

mliberty1 commented 4 years ago

Wow. You have definitely been diving into the code. Let's see if I can address your points:

  1. The "official" USB backend API says nothing about the synchronous mode. I made this for my own debug and test purposes. In all cases, the backend is NOT thread-safe, and it is presumed to run in a single thread. That one and only one thread must call process(). So, thread-safety is a non-issue :wink:. As you found out, device_thread.py is the thread-safe wrapper around the backend USB API. Now, I do not see any thread-safety warning in api.py. Doh! I will add one.

  2. Regarding stop_code, a value of None indicates no stop yet. A value of 0 indicates success. A value of -1 indicates that a stop was requested by external code (through the streaming callback). All other values are errors. The enumerations all have integer values.

  3. I pulled my hair out over this one. How can you create an async API where you cannot cleanly abort something that failed? That's exactly what you get with WinUSB API and the control endpoint. So, the existing code just waits until the transfer times out. I just did a search, and found this recent answer to a 10 year old (!!!!) question:

  4. Yes, the purpose of WinUsbDevice._event is to provide external signaling through DeviceDriverApi.signal(). The device_thread.py implementation calls signal() when it places a new command message in the python queue. This event allows the USB thread to wait on both OS events and a python queue.

  5. Ah, EndpointIn::process_signal() is an optimization trick. When new data arrives, it immediately calls the _data_fn callback. Since the overlapped transaction is not reposted until this callback completes, _data_fn must be very fast. In the normal implementation, this just copies the data to the stream_buffer. However, the stream_buffer still needs to process the samples. We defer this until all overlapped transactions have been processed, and then call the _process_fn callback through process_signal. At one point, I did all data processing in pure Python, which required this optimization. Now that stream_buffer is Cython, it is probably fast enough to not need this split. If your code can keep up with only _data_fn, you can remove process_signal() and _process_fn.

"I'm really impressed with your Endpoint buffer shuffling and control structure. I learned a lot from it." Hopefully it was good things :wink:. At the core, it's quite simple: a linked list of pending transactions and a linked list of free/completed transactions. The goal is to keep the pending transactions as full as possible so the host keeps pulling USB data. The error handling is what makes it complicated, and some of the error handling implementation was definitely a trial & error process.

I would be happy to review your code if you want. Just send it along!

petertorelli commented 4 years ago

Thanks for taking the time to answer all of these. I have encountered similar issues with resolving async code (especially Promises-races) when the in-flight code leaves the hardware in an indeterminate state. And not just USB. Without some foolproof tabula-rasa s/w mechanism, there comes a point when you have to tell the user to replug the device. (I'm fairly certain this problem is older than me!) The process_signal now makes sense: I typically run RawProcessor->Downsample->NaN Filter->Disk Write in _data_fn which can't get any slimmer, but viewed in the context of StreamBuffer it's clear.