mlsquared / fling

Fling 2.
0 stars 0 forks source link

Study a few different beacon algorithms. #31

Open dosirrah opened 2 weeks ago

dosirrah commented 2 weeks ago
  1. Simple
  2. Window
  3. Pairs
  4. k-parity bits

Measure precision, recall, accuracy, percentage of listening times exceeding 1 second (or 2 seconds), percentage of times exceeding 10 seconds, and overhead for each beacon algorithm. Overhead is the average number of bytes needed to communicate a 32-bit short code. We should try to achieve an error rate that is quite small $\epsilon$ or equivalently an accuracy that is near 1, i.e., $1-\epsilon$. $\epsilon$ can be set to somewhere in the range of $10^{-6}$ or $10^{-8}$. Note that a "transmission" includes the set of transmissions that are generated for a given code. We can increase accuracy by increasing redundancy in the encoding or by using error detection (e.g., parity bits).

We measure percentage of listening times exceeding 1 second, because users are unlikely to differentiate times below 1 second. People are likely to feel the system is sluggish if we start taking tens of seconds or more. People may interpret the algorithm as unreliable if it starts taking more than 10 seconds.

Let B = block length. All frames must are assumed to have lengths that are multiples B beyond some minimum header size. B is typically 16 bytes.

Simple

Use uncommon packet lengths. The beacon detector breaks up packets based on which WiFi address they are sent from and only considers frame from non-basestation nodes. It should Ignore all packets that are not of a valid length for encoding side channel bits using frame length modulation, but otherwise the "Simple" algorithm is always looking for beacons and imposes no error correction or detection. I expect that simple will frequently generate incorrect short codes.

Window

Only try to decode a short code if at least $n$ packets of valid lengths for frame length modulation of short codes have been received within a time window spanning the last $T$ seconds. The presence of any $n$ packets of valid lengths is analogous to the presence sequence proposed in https://github.com/mlsquared/fling/blob/main/analysis/frame_length_mod.ipynb . However, the uncommon packets are also data packets.

Look backwards across the time window trying to decode a short code starting from each packet that is within the valid lengths for frame length modulation. Send each short code as a. burst of $n$ back-to-back packets.

Window + Start Frame

We could look for $n$ frames with valid lengths including presence frames within any time interval. Start with using a 1 value to denote presence. Once we have determined that a window likely contains a short code, we look at the window of frame lengths more carefully. We find the first frame in the window with length $H$. The frame of length $H$ denotes the start frame for the short code. The start frame code is a reserved packet length used only to denote the beginning of the short code. We then look at next. Let H be the minimum length frame we use for our side channel. We reserve the frame of length H to act as a start frame denoted S.

For example, assume we communicate 4 bits per frame and thus have 16 distinct frame lengths + 1 frame length to act as a start frame.

D2 D3 D4 XXXXXX D7 D8 S D1 D2 D3 D4 D5 D6

The S allows us to know where the sequence of frames begins. We can look backwards and forwards from S to try to assemble the appropriate 8 frames to recreate the short code.

Pairs

Implement the algorithm described in Section 6 of https://github.com/mlsquared/fling/blob/main/analysis/frame_length_mod.ipynb . This algorithm may be overkill. With this algorithm, the initiator beacon sends packet pairs. The packet in each pair is a presence packet. The presence sequence starts from a minimum length H then H+B then H+2B up to H+(k-1)B. With each packet pair, the first is a presence packet and the second is a data frame.

For example, we use 16 lengths for each packet. Assume that WiFi is padding frames to the next largest 16 byte multiple above some base header length. This allows us to communicate 4 bits per packet. We are communicating a 32-bit short code, so this means we must send n=8 data packets. So if k = n = 8, the length of the presence sequence matches the number of data frames needed to send a short code. Let P denote a presence frame and D a data frame. P1 denotes the first presence frame in the presence sequence. P1 is a frame of length H. P2 is a frame of length P1 + 16. Pk is a frame of length H + (k-1)16. Data packets have a variable length because they encode the short code. We thus see

P1 D1 P2 D2 P3 D3 P4 D4 P5 D5 ... P8 D8

We know the short code starts with the packets after P1. If we are missing a packet in the presence sequence then we can wait for the sequence to repeat to pick up the missing ones. We stop when we have received all n data packets. For example, let XXXX represent a period of lost communications.

P4 D4 P5 D5 XXXXX P8 D8 XXXX P2 D2 P3 D3 P4 D4 P5 D5 P6 D6 P7 D7 P8 D8 P1 D1

In the above scenario, within our window the first frame we see with a valid side channel packet length has length H+(4-1)16, which we interpret as presence packet P4. We interpret the next frame with a valid side channel length as being D4. For example, if D4 has length H + (k-1)16 + 316 = H + (8-1)16 + 3 * 16 = H + 160 bytes then we interpret this as having represent a data value of 0x2, i.e., 0010. We receive data packets D4, D5, D8, D2, D3, D4 (repeat), D5 (repeat), D6, D7, D8, D1. In example above e when we receive D1, we have now received all 8 data frames, but they are out of order. We can reorder them to D1, D2, D3, D4, D5, D6, D7, D8. The length of D1 contributes the first 4 bits to the short code. The length of D2 contributes the second 4 bits. The length of D3 contributes the third 4 bits.

This method will have more overhead than the "window" method, but it may improve accuracy in the presence of substantial cross-traffic. If we wait a moment between each pair we may decorrelate error due to bursty cross-traffic.

Start frame + m-bit check sum

For each short-code encode an additional number of parity bits in the last packet in the set communicating a single short code. This can be combined with any of the above Simple, Window, or Pairs.

For example,

D2 D3 D4 XXXXXX D7 D8 CS S D1 D2 D3 D4 D5 D6

We reorder this as

D1 D2 D3 D4 D5 D6 D7 D8 CS

We communicate a minimum of a start frame plus $n$ data frames. We assume that the frame appearing before the start frame S contains the m parity bits. If we use 4 parity bits and we are using 32 data bit short codes then we could use $n=8$ data frames, 1 parity frame, and 1 start frame. This would take 10 frames to communicate a 32-bit short code. We sum lengths of D1 through D8 modulo by 16 and see if it equals the. checksum. If it doesn't then we could continue receiving side channel frames using a consensus mechanism. Let i denote the data frame in the short code. If the same length for Di appears we assume that it is the correct value. If there is disagreement for Di then we try each presented value to see which matches the checksum. We keep receiving datagrams until we find a combination that matches the checksum.

Start frame plus CRC-8

We use CRC-8, an 8 bit Cyclic Redundancy Check. To communicate 32-bits with 4 bits per frame, we would thus need to communicate a minimum of 10 packets.

D2 D3 D4 XXXXXX D7 D8 C1 C2 S D1 D2 D3 D4 D5 D6

C1 and C2 each communicate 4 bits of the 8 bit CRC-8.

Start frame plus CRC-16

Same as "Start frame plus CRC-8" but we use a 16-bit CRC. This should dramatically reduce the probability of an erroneous short code being communicated.

Other hybrids

If packets are frequently reordered then packet pairs allows us to more reliably recover packet order. We could combine using packet pairs with checksums or CRC. If there is still too much noise to achieve 10^-6 or 10^-7 error rates then we could investigate using Reed Solomon Coding.

Burst errors are likely to be handled by simply repeating the short code a number of times. However, if the algorithm begins incurring too much overhead, we could try pausing a short but random amount of time after communicating the short code before sending it again.

I doubt that packets are frequently reordered or that noise is so high that we need Reed-Solomon coding. My intuition tells me that a start frame plus CRC-8 or CRC-16 will provide the shortest average listening time within the constraint of an error rate of 10^-6 or 10^-7.

dosirrah commented 1 week ago

See https://github.com/mlsquared/fling/blob/main/analysis/frame_length_mod.ipynb However, check it out to your local system and run jupyter notebooks. GitHub doesn't display all of the embedded images.

With packet length encoding, longer packets are more expensive. We achieve optimal bandwidth in bits per second with packets varying between M and M + 256 where M in the upper-edge of the distribution mode for short packets > 116 bytes. Given that WiFi pads frames up to a multiple of 16 bytes, we can only use packet lengths at a granularity of 16 or greater bytes. If 116 bytes means 0, 116 + 16 = 1, 116 + 32 = 2, and so on.

According to the analysis in frame_length_mod.ipynb:

Number of different lengths at maximum efficiency: 10 Max efficiency in side channel bits / bits transmitted is 0.002208728786494257 Most efficient maximum frame length is 260.0 bytes Most efficient average frame length is 188.0 bytes Bits communicated in a single frame at most efficient length is 3.321928094887362 If we use 10 levels then the maximum length is 116 + 10*16 =276 bytes.

Using 10 levels each packet does not communicate an integral number of bits. However, it does communicate one decimal number per packet if we have no additional encoding overhead.