Question: alternate protocol support

symdeb commented 2 years ago

Could LUNA be extended with other protocol analysis (SPI, I2C,MIPI etc..) using the 16 user defined I/O ports as generic I/O ports and bring those pins to the outside of the enclusure on a header ? This would make LUNA a very lucrative alternative to other $1000+ USB2.0 and multiprotocol analyzers. Not being familair with LUNA yet, suppose this would be grabbing a bunch of samples into the buffer and send them to the host software that for example Sigrok/PulseView or other open source software could analyze ?

martinling commented 2 years ago

We've actually already made the change to bring those I/Os to the outside of the enclosure in the form of two PMOD ports. See this update on CrowdSupply for more info.

It would certainly be possible to enable using those I/Os as a simple logic analyzer in exactly the way you describe. Writing the software and gateware to do that isn't on our priority list at the moment, as we need to focus on the USB features we've committed to, but it would be a fairly straightforward project.

symdeb commented 2 years ago

Thank you. That tis really great. A few others question while having difficulties get through the Discord phone verification process (it seems to be locked up somehow) so post them here

Could not find "technial specifications" for LUNA: Curious for what is the internal sample rate ? Other prduct use 1 to 2GHz (DSLogic/Zeroplus). What would the rate be for the user GPIO and what speed in/out clockd could be created?
Read that memory is 8MB . Other products support 64~128MB For USB HS, how many seconds of data could be stored in the buffer or is for HS ? Could memory be expanded and would streamed transfer an viable option through the USB 3.0 host ?

martinling commented 2 years ago

Curious for what is the internal sample rate ? Other prduct use 1 to 2GHz (DSLogic/Zeroplus). What would the rate be for the user GPIO and what speed in/out clockd could be created?

LUNA doesn't work by sampling the USB data lines directly at high speed, there are USB 2.0 PHYs for each port which are used as the high speed interface.

The FPGA is connected to the ULPI interface of each PHY, which includes an 8-bit parallel data bus. Because of that 8-bit wide bus, the logic for capturing USB packets, buffering them and delivering them to the host only needs to run at 60MHz (8 x 60 = 480Mbit/s).

The simplest way to integrate GPIO capture would be to sample the PMOD I/Os at the 60MHz clock, and add the resulting data to the capture buffer. With a bit more work, they could potentially also be sampled/driven from another clock domain at a higher rate. The ECP5 I/O buffers are limited to 200MHz in / 150MHz out for LVCMOS, but can go up to 400MHz for LVDS. Buffer space and USB throughput for the logic data will be the main limiting factors though, especially if trying to capture USB at the same time.

Read that memory is 8MB . Other products support 64~128MB For USB HS, how many seconds of data could be stored in the buffer or is for HS ? Could memory be expanded and would streamed transfer an viable option through the USB 3.0 host ?

Since we identify and capture USB packets on the FPGA, rather than just raw samples of the data lines, the duration we can store in memory depends on the USB traffic involved. In the most demanding scenario where the bus is fully utilized at HS, an 8MB buffer provides something like ~150ms of capture.

We can stream buffered packets to the host at around 40MB/s, so if the target bus utilization is moderate, or limited to short bursts of high traffic, it is possible to capture indefinitely. Due the overhead involved in streaming a HS capture over HS however, it's not possible to maintain continuous capture if the target bus is more heavily utilized.

It may be possible to achieve unlimited HS capture by a channel bonding approach using both the 'host' and 'sideband' ports for streaming, but this would require two cables to the host which would have to be attached to independent buses. Implementing this is not currently on our roadmap.

The current hardware does not have any support for USB 3.0. There is partial support for USB 3.0 in the LUNA gateware library, but utilising it will require a future product with different hardware.

symdeb commented 2 years ago

Thank you much for the explanation. Ideally that user IO could be used to test MII/RMII/GMII for ethernet, but those are 25, 50 and 150MHz respectively.

For USB, the use case faced is as follows: 1, Using a tool based on libusb sending a control request from a host PC to retrieve a interface descriptor from a device does not show anything going out in wireshark (retrieving device descriptors works fine and show up in Wireshark)

Adding breakpoint/debug code in the USB device does not show a receipt for such packet
The USB device firmware library creator would like to see "proof" that the request did go over USB before putting the library into question.

Thus the idea was to use an analyzer to capture if there was a really a transfer on bus:

At Full speed, the buffer might be large enough for several seconds of USB data that can be transfered to a USB 2.0 host
For High speed, longer capture requires streaming and requires an USB 3.0 to the host to keep up the data tranfer

Approaches: A. Put the device in full speed and use a low cost USB analzyer. the result would probably be the same at high speed, or B. At high speed use a high cost capture device and a USB 3.0 PC host with sufficient speed to catch up the USB 3,0 data

So here is why LUNA came in as a low cost option for HS. The only question now is if the USB PC host is fast enough. That why a buffer of say 256MB would come in handy for several seconds of buffered data (perhaps compressed) Please correct me if this way of thought is incorrect or if there is a better approach

martinling commented 2 years ago

The other way around the USB 2.0 bottleneck is to do some packet filtering on the FPGA, to exclude things that aren't of interest. E.g. you could discard all traffic that's not on endpoint 0, which would keep the data rate down whilst still ensuring that you see the transfer you're looking for if it's there.

At the moment we don't have frontend features for doing that, but with our Amaranth workflow, an ad-hoc hack to add a filter on the FPGA side can be as simple as editing luna/gateware/usb/analyzer.py and re-running things.

symdeb commented 2 years ago

That is a great feature for custom code for data manipulation and triggering. Looking at the schematic, the upstream USB is high speed. Even the ULPI runs at 60MHz, all the data has to be moved upstream again to the host (PC). if that is uncompressed the data amount would be about 1:1 (if not more if there is some overhead such as meta data), correct ? Since the FPA need to cope with the "slow" upstream USB 2.0 ULPI interface, Would that not cause congestion in the FPGA ?
In other words, were there any considerationto us an USB3.0 tranceiver instead that could reduce risk for such bottlenecks for streaming data to the host (PC) ? The FPGA would need to add functionality to interfae to such an USB 3.0 device controller. Key question: Have any experiments been done if the USB HS streaming to the PC host can sustained over longer periods, say 5 to 10 seconds ?

zyp commented 2 years ago

We're using the LUNA stack in Orbtrace, and I can share some performance numbers from our testing.

First of all, the theoretical max HS USB bulk capacity is 13 packets per microframe, and at 512B per packet and 8000 microframes per second, that comes out to 425.984 Mb/s (53.248 MB/s). We have however not been able to reach this in our testing, probably due to practical limitations with the host side scheduling and reserved capacity for other devices on the bus.

We've however been able to reach 12 packets per microframe which comes out to around 393 Mb/s (49 MB/s), and can sustain this as long as there's not a lot of other traffic on the bus fighting for capacity. We're doing this with only 8kB+1kB of buffering on the device side.

I figure more buffering is mainly useful if you've got bursty traffic with a reasonably large difference between peak and average data rates.

martinling commented 2 years ago

Overhead depends somewhat on the nature of the traffic but let's work through a simple high-throughput case - capturing a single target device making one continuous flat-out bulk IN transfer with full 512 byte packets. Assume for the sake of simplicity that we're filtering out SOF packets and other traffic, and that nothing gets NAKed.

In that scenario, for each 512 bytes of payload data sent by the target device, LUNA will currently put 525 bytes into the capture buffer for the IN transaction:

2 bytes length header + 3 bytes IN packet
2 bytes length header + 3 bytes DATA0/1 packet fields + 512 bytes payload
2 bytes length header + 1 byte ACK packet

As @zyp notes, the realistic limit for one device is 12 of those transactions per microframe and there are 8000 microframes per second, so LUNA has to buffer 12 8000 525 B/s for a total of 50.4MB/s, and could stream that buffer to the host at 12 8000 512 B/s or 49.152MB/s. So the buffer would fill up at (50.4 - 49.152) = 1.248MB/s. With an 8MiB buffer that would correspond to 6.72s of capture time. But if we can compress the buffered data on the FPGA side by even just 3%, then it becomes possible to capture indefinitely.

In most practical scenarios, HS throughput to the host should be sufficient to get things done, especially when combined with all the filtering and triggering logic that's possible through customising the analyzer gateware. Adding a USB 3.0 transceiver would have significantly increased the cost of the device. For scenarios where you really want to capture every detail of a completely saturated bus, there's always the possibility of using the second USB 2.0 port to increase throughput.

symdeb commented 2 years ago

6 seconds would be more than enough to gather data. Looking at the schematic, the USB3343 is used. Does LUNA support USB Low/full speed as well . It would require to get the USB data, but not via ULPI,correct ?

martinling commented 2 years ago

Yes, low and full speed capture are supported. At the moment you have to select which speed you want to capture, but on-the-fly speed detection is on the todo list.

greatscottgadgets / luna

Question: alternate protocol support #175