reproduction of CSI in same location but different time session

yujianyuanhaha commented 3 years ago

greeting,

Is it possible to see the nearly same CSI waveform, from a receiver in the same location(all the environment, i.e. stuff in the house is almost the same) but different time session? So that the CSI can be used for some fingerprint-based application like localization.

A toy example is like this, I collect CSI from the nexmon ASUS router in 4 different localization (I marked the location in my house so the error should be pretty small), and repeat the collection in 3 different time session (illustrate in the fig below)

So in this way we get 4x3 =12 pcap files and corresponding CSI waveform, I plot all of them (illustrate below)

Well, we see some good point (1). CSI is pretty stable in a fixed location (2). CSI differs in different location so we can use CSI for localization application. (3). Are Csi the same in same location but different time session? From the figures, It does not support the previsou assumption well, I don't not see much similarity among sessA-Loc1, sessB-Loc1 and sessC-Loc1, nor sessA-Loc3, sessB-Loc3, sessA-Loc3. However, the sessA-Loc2, sessB-Loc2, sessC-Loc2 are pretty the same and make sense for me.

Could anyone met the same issue and any suggestions to make it right? @DanielAW @schmittner @matthiasseemoo @zeroby0 @mzakharo

Thanks in advance.

Some additional notes:

I am use 1x1 spatial stream, channel 100/ bandwidht 80M, and single mac address, single frame starting byte. The command is like ./makecsiparams -c 100/80 -C 0x1 -N 0x1 -m 04:D9:F5:75:AF:14 -b 0x88
I check surround wifis, and there are only my wifi TX in channel 100

yujianyuanhaha commented 3 years ago

Unfortunately, although CSI (especially intel 5300 and AtherOS) is so densely studied and many papers available. The first assumption (CSI don't change over time if the environment don't change) is less checked.

zeroby0 commented 3 years ago

I worked on this, and have developed methods to make CSI stable and noise-less. The data I collected only spans for 24 hours though. After processing, it seems data collected in the first 30 seconds is still similar enough to data through out the day that a random forest classifier can still accurately classify it more than 90% of the time. But the raw CSI values obtained from Nexmon not always very similar across time.

Of course, it depends on how your environment is, and how dynamic it is. I had at most 4 people in the same room simultaneously. And the accuracy seems to be different in cases of Line of Sight and NLoS, with higher accuracy in NLoS.

I think this is all the info I can reveal before the paper is published, which should happen on at the end of March. I'll update this in April if someone needs it then.

All the best :)

yujianyuanhaha commented 3 years ago

@zeroby0 thanks a lot. Appreciate if you can share it on arXiv soon. For anyone interested, two method I founds useful: (1) Mahalonobis distance and Amplitude calibration paper CRISLoc: Reconstructable CSI Fingerprinting for Indoor Smartphone Localization, which can filter out a small cut of abnormal packages (2) Multi-variable temporal smoothing, in paper CSI based indoor localization using Ensemble Neural Networks , which can suppress the noise a bit. But neither of them can drastically fix the issue I mentioned.

Such a unmatch is also show in atheros CSI tool issue Unstable CSI amplitude (see attached below), I guess it is due channel fading.

yujianyuanhaha commented 3 years ago

@zeroby0 do you mind share a bit about your way of pre-processing.

Best.

zeroby0 commented 3 years ago

Draft post

Hey, @yujianyuanhaha ! Sorry, I forgot to give the details.

It's actually really really simple. I don't know if it would work for all localization techniques, but I've found it to work great with Fingerprinting.

The first step is to remove the Null and Pilot Sub-carriers. I found Null sub-carriers to contained pretty much arbitrary values, so they were of no use for Fingerprinting. I removed Pilot sub-carriers just to err on the side of caution.

With Null and Pilot	Null and Pilots marked

The next step is to remove the noisiest of the samples. I used Cartesian Distance and removed 1/3rd of the samples geometrically farthest from the Centroid/Mean of the samples. It worked, but it's kinda naive, and I didn't know much about CSI or processing back then. I suggest exploring IsolationForests and other unsupervised ML techniques for noise removal.

Without Null and Pilot Subcarriers	After removing outliers

Even now, the CSI fingerprint is very noisy. There's artifacts from people moving around, and from the AGC. I was also doing Astrophotography when I was working on this, and I felt there are parallels. The atmosphere distorts different frequencies of light differently, and clouds obstructs the light similarly. One of the solutions to deal with that is to take multiple photos of faint objects and combine them to create a better photo.

So my solution to stabilizing the CSI signal is to simply take 4 consecutive CSI samples and then average them 😂.

CSI after stabilising

Of note is that the samples shown in these graphs come from multiple Devices/MacIDs. Ideally you should separate the samples by MacID, and then do the outlier removal and stabilisation per Mac id, but I forgot to do that when making these slides. You may also further separate them by Frame Type, and by the actual bandwidth of these samples. You can see that some of the initial samples had only 40 MHz bandwidth, and were removed in outlier removal, but they're not really outliers, just a different type of sample.

I have also considered using RSSI to do the outlier detection, and to counteract AGC, but never really got around to doing it. For example, samples with very weak RSSI are probably very noisy, and can be ignored.

I found the CSI samples so derived to be highly stable relative to the raw samples. Both across time, and when there is noise in the channel because of people.

When I was doing this, I also found that different wifi devices had different 'quality' of CSI. My phone, for example, emitted samples with a very high noise, but samples from my router were much stabler.

Okay, that is a long wall of text, haha. This is a draft post. I need to simplify it and remove errors later, but I'm making it available right now so that I don't procrastinate and forget making the post altogether. Please point out if you find any errors, and feel free to ask if you need any other details :)

yujianyuanhaha commented 3 years ago

@zeroby0 thanks a lot for the sharing. I agree most of the points in your reply. (1), remove the null subcarrier (2) apply distance-based outliner remover, either Cartesian or Manhala, but it require high sampling rate (3) average over consecutive CSIs. and Router is much more stable than NEXUS phone. I work on single-Mac address and single-frame type, in few seconds they are pretty stable.

Additionally, AGC issue can be easily fixed by the calibration CSI_calibrate = CSI / sum(abs(CSI))

What is the approximate time interval for the cross-session in your case? I mean the time gap, for example, between sessA-loc1 and sessB-loc1 in my post.

zeroby0 commented 3 years ago

Mine was 24 hours. The graphs shown in the previous post were about 5 minutes, but I collected CSI for 24 hours, processed it, and then drew a histogram of distances from the mean. I should have the plot somewhere. If I don't find it, I'll collect data again and make a new plot.

yujianyuanhaha commented 3 years ago

@zeroby0 the plots on mine side (also collect on 5G band raspberry pi) show cross session fail, i guess this is a universal problem either 5300, atheros or nexmon. Below is same location in 4 different session, the gap is 5 min. I filter some null subcarrier so all subcarriers are 211 out of 256. There are not many noisy and odd waveforms.

Since subcarriers are in 4 group, each for 20M band, I re-plot the waveform below for better visualization.

yujianyuanhaha commented 3 years ago

another work we can refer to is "Tewes, Simon, and Aydin Sezgin. "WS-WiFi: Wired Synchronization for CSI Extraction on COTS-WiFi-Transceivers." IEEE Internet of Things Journal 8.11 (2021): 9099-9108.", it mentioned the temperature and attennas imperfection that bring the offset.

yangtat commented 3 years ago

@yujianyuanhaha Thank you for your work. Just like #183，I found that the amplitude of CSI I collected also suddenly multiplied by 2. Have you solved this problem by modifying shft? In the AGC calibration CSI_calibrate = CSI / sum(abs(CSI)),Is sum(abs(CSI)) the sum of the amplitudes of the subcarriers? After this treatment, can the amplitude reflect the transmission distance?

yujianyuanhaha commented 3 years ago

@yangtat (1). my way of calibration should fix suddenly multiplied by 2. you mentioned (2) well, after this treatment, the amplitude CANNOT reflect the transmission distance, but since CSI can reflect the environment change so I believe the waveform can reflect the location change.

seemoo-lab / nexmon_csi

reproduction of CSI in same location but different time session #158