quantumhub commented 3 years ago

Dear Authors,

Thank you for your great work! I obtained CSI at about the rate of about 50 frames per second. It there anyway to obtain faster CSI copy channel? A. Does these two SPRs have the freedom of copy a larger piece of CSI (e.g. >16 subcarriers)? [#define SPR_RXE_Copy_Offset spr001

spr.inc#L9)

[It is copy rx data to shared memory. What is needed here is shared memory to fifo.]

Suppose it could work, but it fails... SPARE1 is target SHM memory (filled with CSI) to be copied. mov SPARE1, SPR_RXE_RXHDR_OFFSET mov RXE_RXHDR_LEN, SPR_RXE_RXHDR_LEN orx 0, 0, 0x1, SPR_RXE_CTL, SPR_RXE_CTL L903: jext COND_RX_FIFOFULL, L809 xjne SPR_RXE_RXHDR_LEN, 0x0, L903

B. By setting a large value in [0xaee], (sk_buff *p->len) shows a larger buffer size: [> mov 0x200, SPARE1

mov SPARE1, [0xaee]](https://github.com/seemoo-lab/nexmon_csi/blob/ba99ce12a6a42d7e4ec75e6f8ace8f610ed2eb60/src/csi.ucode.bcm4366c0.10_10_122_20.patch#L188) [a small p->len =2 allows more efficient copy, since only rxhdr is of our interest.]

But Both A. and B. just fill p->data tail part with unknown data, rather than intended data.

Could anyone provide a faster CSI copy channel implementation (to reach >150 CSI snapshot per second)? Thank you!

mzakharo commented 3 years ago

With firmware as is, I was able to achieve 1000 CSI/sec by sniffing beacons of a select AP. (Limited by 802.11 min beacon period of 1ms). Seen rates top-out around 2-3 thousand samples/sec by just sniffing surroundings without filters. Dont think those SPR registers are your bottleneck.

quantumhub commented 3 years ago

Dear @mzakharo ,

Thank you for sharing your sampling rate! Beacon is with low transmission data rate, but it don't do much backoff as normal data packet (with low priority).

For ac86u, the firmware provides 4x4 80MHz CSI in 16 UDP packets (each packet has the coefficients for a SISO channel).

Does the 1000 mean in unite of CSI/sec or the number of UDP packets per second? The recovered whole 4x4 CSI number per second is this UDP number divided by 16.

mzakharo commented 3 years ago

Testing was done on a Nexus 6P, with 2 antennas, no spatial streams for beacons. 2 UDP packets per CSI.

quantumhub commented 3 years ago

A note: Change RXE_RXHDR_LEN to a lager number won't help. The actual length has a wrap around, e.g. set it to 158 only provide p-len - RxFrameSize = 158%64 = 30 word = 60 bytes, even smaller than 62 word (=124 bytes). RXE_RXHDR_LEN=62

yujianyuanhaha commented 3 years ago

@quantumhub the maximum rate I get is around 1600 package per second ( approximate 100 packager per second for each spatial stream). My way is edit the makecsiprams -d option

zeroby0 commented 3 years ago

If I don't save the results into pcap, I seem to get a higher rate. If I remember it correctly, I've seen upwards of 6000 samples per second without recording. Which makes me wonder if the bottleneck is in the processor or the storage media, than the firmware.

Can anyone create a ramdisk, and check if saving pcap to the ramdisk improves speed? @quantumhub also forwarded the packets to their PC instead of saving it locally, I hope that made the CSI collection faster.

Edit

I tried using a ramdisk and a good sd card (Samsung Evo). I seem to get about the same number of maximum packets per second in both of them. I captured about 8600 packets in 5 seconds on both SD and ramdisk with 80 MHz recording. Without writing to pcap, tcpdump was able to see about 10600 packets in 5 seconds, with and without -vv.

So I don't thing a ramdisk would hugely improve your throughput if you have a good SD card. I will try packet forwarding and xdpcap next. Sidenote: Writing pcap files directly to SD maynot be a good idea. Those things have limited number of writes.

quantumhub commented 3 years ago

Thank you for your helpful suggestions and discussion! @yujianyuanhaha @zeroby0

Which delay value is optimal for sampling rate?

asm code excutes the waiting (delay) immediately after calling a FIFO (RXHDR delievery). Then, some c code use xmit to forward content as UDP packets. If a smaller delay value increased sampling rate, the bottleneck is at asm code, I believe.

yujianyuanhaha commented 3 years ago

@zeroby0 @quantumhub a reference value is that of intel5300/ atheros CSI laptop NIC, they save csi into .dat file and reach up to 4000Hz (3x3 spatial stream, 20M bandwidth). I assume for ASUS nexmon (4x4 spatial stream, 80M bandwidth) it can reach 562Hz (=4000*9/16/4). Although I can only get around 100Hz so far.

zeroby0 commented 3 years ago

Hey @yujianyuanhaha

I was able to get to about 1600 packets per second on Pi 3B+. The SD card I was writing to had a write speed of 190 MBps, and the ramdisk I created had a speed of 350 MBps. I reached a nearly identical packets per second on both of them, and tcpdump saw 2100 packets per second without writing to pcap.

You might be able to increase your packets by ping flooding from your Pi, like I did. I ran two ping <laptop's ip> -i 0.0001 when capturing. If I had more packets in the air, and a better processor (pi 4), I think we might be able to higher than 2000 pps.

Linked comment with ramdisk details: https://github.com/seemoo-lab/nexmon_csi/issues/200#issuecomment-817274199

matthiasseemoo commented 3 years ago

The transfer of the CSI to the host is a bit hacky in our csi extractor. The D11 core controls the Wi-Fi phy during frame receptions. Each Wi-Fi frame ends up in some FIFO memory and the D11 core can instruct the DMA controller to take the frame from the FIFO and copy it to either the host memory or the internal ARM core’s RAM. Besides the data stored in the FIFO, the DMA controller can first transfer additional data from the D11 core’s memory to the host as some kind of frame header containing additional information about the received data. In our CSI extractor we abuse the area for transfering additional data to transfer CSI information instead. As the size of additional data is limited, we repeatedly request DMA copy operations based on different data stored in the shared memory. So, here, we are already slower than if we could directly instruct the DMA to copy the whole CSI block from shared memory to the host. However, what is worse, the registers to read CSI are neither mapped into the D11 core’s nor the ARM core’s address ranges. Instead, the registers need to be read indirectly through another register interface, which makes the copy operation of CSI data into the D11 core’s memory quite slow. If you want to get the highest frame rate of the CSI extractor, I advice you to inject frames that will trigger the CSI extraction.

On 8. May 2021, at 15:01, Aravind Reddy V @.***> wrote:

Hey @yujianyuanhaha https://github.com/yujianyuanhaha I was able to get to about 1600 packets per second on Pi 3B+. The SD card I was writing to had a write speed of 190 MBps, and the ramdisk I created had a speed of 350 MBps. I reached a nearly identical packets per second on both of them, and tcpdump saw 2100 packets per second without writing to pcap.

You might be able to increase your packets by ping flooding from your Pi, like I did. I ran two ping <laptop's ip> -i 0.0001 when capturing. If I had more packets in the air, and a better processor (pi 4), I think we might be able to higher than 2000 pps.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/seemoo-lab/nexmon_csi/issues/200#issuecomment-835353215, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ773S2JOKLBG7LKB2QDZ3TMUY2BANCNFSM42F76SVA.

quantumhub commented 3 years ago

Dear @matthiasseemoo , thanks for your great work and detail explanation on how CSI info flow works. Do you provide more instruction on how to "inject frames that triggers the CSI extraction" if 4x4 MIMO csi is extracted for the channel between two ac86u?

matthiasseemoo commented 3 years ago

In the end, it is like on all the other Broadcom Wi-Fi chips. You create a packet buffer on the Wi-Fi chip and fill it with your Wi-Fi frame prepended with a TX header that tells the hardware with which settings the frame shall be send. There, you can explicitly decide on the modulation and coding scheme to use.

On 8. Jul 2021, at 12:41, quantumhub @.***> wrote:

Dear @matthiasseemoo https://github.com/matthiasseemoo , thanks for your great work and detail explanation on how CSI info flow works. Do you provide more instruction on how to "inject frames that triggers the CSI extraction" if 4x4 MIMO csi is extracted for the channel between two ac86u?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/seemoo-lab/nexmon_csi/issues/200#issuecomment-876332142, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ773VDMQLGATLUOO7IWMDTWV6HFANCNFSM42F76SVA.

Mweme commented 3 years ago

Hello matthiasseemoo, thank you for great job you have done regarding CSI I'm using Raspberry Pi3, but every time I try to clone the git i face the following challenge fatal: repository 'https://github.com/seemoo-lab/nexmon.git./' not found root@raspberrypi:/home/pi# kindly assist

matthiasseemoo commented 3 years ago

remove the ./ from the git url.

On 28. Jul 2021, at 03:10, Mweme @.***> wrote:

Hello matthiasseemoo, thank you for great job you have done regarding CSI I'm using Raspberry Pi3, but every time I try to clone the git i face the following challenge fatal: repository 'https://github.com/seemoo-lab/nexmon.git./ https://github.com/seemoo-lab/nexmon.git./' not found @.***:/home/pi# kindly assist

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/seemoo-lab/nexmon_csi/issues/200#issuecomment-887934725, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ773V4FTBFS5MCNSUIOTTTZ5KGVANCNFSM42F76SVA.

seemoo-lab / nexmon_csi

Faster CSI copy channel? #200

define SPR_RXE_Copy_Length spr002](https://github.com/seemoo-lab/nexmon/blob/57a2fb0c3e4d774f32c96160dab9048b888a1d28/buildtools/b43-v2/debug/include/spr.inc#L9)

Edit