sz3 / libcimbar

Optimized implementation for color-icon-matrix barcodes
https://cimbar.org
Mozilla Public License 2.0
481 stars 27 forks source link

Towards 6-bit tiles? #69

Open mihaigalos opened 1 year ago

mihaigalos commented 1 year ago

Hi,

Awesome stuff!

In your TODOs, you mention it would probably be possible to go from 4-bit tiles to 5-bit ones (32 title hashes).

Since color seems to have a sweet-spot at around 2 bits, do you think 6-bit (64 value) tiles are possible?

It would mean additional effort in designing the tiles to avoid hash collisions, while still maintaining at least 16 bits Hamming distances.

mihaigalos commented 1 year ago

I went through the code again.

It would mean a bit of a refactoring to add 6-bit support - but not impossible. Would love to help, if you think it's worthwhile.

It would mean a 4x increase in the datarate naively assuming linear scalability (2-bit colors, 104kB/s initial 4 = 416kB/s, mileage might vary), with ~30kB of data encoded (7500 bytes initial 4 /1024 = 29.29 kB).

sz3 commented 1 year ago

That scalability math isn't quite right, unfortunately. One way to conceptualize it is to consider the grid size the base unit, e.g. 8x8 with 1px buffer == 9*9 == 81 pixels. With 6 bit (4 symbol bits + 2 color bits) we're effectively "spending" 13.5 pixels per bit. If we bump that up to 8 (however we get there -- 5 symbol bits + 3 color bits, 6 symbol bits + 2 color bits...), that's 10.125 pixels per bit.

Another way to ballpark it is to just assume we have a 1000x1000 image (since that's close enough to the cimbar dimensions) and divide by the grid size (9x9 in this case). That gives us 112(ish) rows and columns, which at 6 bits per (/ 8 bits per byte) comes out to 9408 (the real number is 9300), and 12544 for 8-bit.

Anyway, it's something like a 33% increase in capacity. Would be pretty nice, but there ways to do better (I think!) that I've been working on...

To answer the initial question,

Since color seems to have a sweet-spot at around 2 bits, do you think 6-bit (64 value) tiles are possible?

I think maybe, but possible and practical are going to be different things. Since cimbar is targeted at real time decoding (and on potato cell phone CPUs, at that), there are two considerations for 4-bit/5-bit/... tiles:

  1. hamming distance between 1 tile and the other N-1.
  2. the computation cost of N*5 compares per tile.

(2) is currently a problem due to everything being CPU-bound on mobile, but as more GPU technologies catch on it seems plausible that 320 concurrent popcnts wouldn't be the deal breaker it probably is today. So we can be optimistic and say this is a problem, but one that could potentially go away.

(1) is more troublesome, IMO. I'm not sure how one would go about finding a viable 64-tile set that satisfied the underlying constraints. I'm not really sure how to even go about it for a 32-tile set. It's not good enough for the tiles to be equally "far away" from each other when conditions are perfect -- the hamming distance calculation needs to survive the sort of nasty shrinking and blurring and skewing and chromatic nonsense that the screen+camera interaction does to it.

sz3 commented 1 year ago

In regards to adding additional configurations -- I'm in favor if you or anyone wants to try some experiments! :slightly_smiling_face:

This branch may be useful for playing with different grid sizes/etc: https://github.com/sz3/libcimbar/tree/5x5-and-decode-exp

The holy grail is probably a 7x7 tileset with 16 distinct tiles. It seems possible (and the math works out really well -- as good as any plausible 8x8 configuration), but also really hard.

The other direction -- the one I've been looking into -- is smaller grid sizes. Specifically 5x5 and 6x6 (vs the current 8x8). The branch is here if you're interested/bored, but it's still a work in progress... I haven't checked any tilesets in, although the 5x5 is pretty straightforward. I'm going to get it merged to master fairly soon, though I don't think I'll be enabling 5x5 until I have more things figured out. (6x6 I've become somewhat pessimistic about)

But my current plan is something like: (A) figure out what performance regressions I've introduced in that branch and fix them, (B) merge to master, (C) continue to improving the decoding algorithm in the hopes of making 5x5 viable -- specifically, the color decoding algorithm. (I want to try some kind of amortized k-means, or similar)

My current 5x5 configuration comes out to ~13k per frame with 4 symbols + 4 colors (2+2 == 4 bits per tile), and ~16k per when I push it to 8 colors. Reliability is still pretty bad though...

mihaigalos commented 1 year ago

Yes, everything makes more sense now. Thanks for the clarification.

(2) is currently a problem due to everything being CPU-bound on mobile, but as more GPU technologies catch on it seems plausible that 320 concurrent popcnts wouldn't be the deal breaker it probably is today.

I believe this will be possible in the coming years, but might be a bit biased. I understand you are targeting real-time but if it takes 5s for a decode, I'm willing to opt-in via a runtime flag, if possible.

(1) is more troublesome, IMO. I'm not sure how one would go about finding a viable 64-tile set that satisfied the underlying constraints.

I'm willing to invest a bit more time here. Would go about writing a genetic algorithm to vary the NxN (8x8, 7x7, etc), compute the maximum Hamming distance to all the other tiles and pick the best generated tiles as a PoC.

I think this is initially doable with Unicode symbols for pixel set/unset (1 or 0).

A Gaussian filter could then be applied, not fully sure if needed, provided the Hamming distance is sufficiently high.

mihaigalos commented 1 year ago

Can you please have a look at this? These are the min and max Hamming distances I get from the default 4-bit set:

Min Hamming distance: 18
Max Hamming distance: 60

I have trouble understanding this phrase (bits to pixels correlation unclear):

Each symbol is around 20 bits (by imagehash hamming distance)

You mean 20 pixels Hamming distance?

sz3 commented 1 year ago

You mean 20 pixels Hamming distance?

Yes. Imagehash "bits" in this context being the 64 that we get from the averagehash (or adaptiveThreshold).

The minimum distance is the one we care about -- I haven't looked at the 8x8 tileset for a while, but 18 sounds right.

mihaigalos commented 1 year ago

I'm done with implementing the genetic algorithm in Python and can now generate tiles. Will evaluate how well they behave at a later point. Do you still want to keep this issue open?

sz3 commented 1 year ago

I think it's fine to keep it open for now. If nothing else, it will remind me that I need to update the docs and merge the 5x5 branch. :slightly_smiling_face:

mihaigalos commented 1 year ago

My current 5x5 configuration comes out to ~13k per frame with 4 symbols + 4 colors (2+2 == 4 bits per tile), and ~16k per when I push it to 8 colors. Reliability is still pretty bad though...

Intrigued. Why only 2 bits for symbols? - would the camera lens resolution not be able to decode it reliably at 4 bits?

I'm pessimistic about 3 bit colors to be honest, colors are heavily influenced by lightning. Also, think about printing a cimbar code on paper for encrypted+encoded storage. The color in the print might degrade over time, the shape less so.

mihaigalos commented 1 year ago

After reevaluating, seems 4-bit tiles is a bit too optimistic. What about 3-bit tiles for 5x5? These have H(min, max)=12,16:

🔵🔵🔵🔵⭕  🔵🔵⭕⭕⭕  ⭕🔵🔵🔵🔵  ⭕⭕⭕🔵🔵
🔵🔵🔵⭕⭕  🔵🔵⭕⭕🔵  ⭕⭕🔵🔵🔵  🔵⭕⭕🔵🔵
🔵🔵⭕⭕⭕  ⭕⭕⭕🔵🔵  ⭕⭕⭕🔵🔵  🔵🔵⭕⭕⭕
🔵⭕⭕🔵🔵  ⭕⭕🔵🔵🔵  🔵🔵⭕⭕🔵  🔵🔵🔵⭕⭕
⭕⭕⭕🔵🔵  ⭕🔵🔵🔵🔵  🔵🔵⭕⭕⭕  🔵🔵🔵🔵⭕

⭕⭕🔵⭕⭕  🔵🔵🔵🔵🔵  🔵⭕⭕⭕🔵  ⭕⭕⭕⭕⭕
🔵🔵🔵🔵🔵  ⭕⭕🔵⭕⭕  ⭕🔵⭕🔵⭕  ⭕⭕🔵⭕⭕
⭕🔵⭕🔵⭕  ⭕⭕🔵⭕⭕  ⭕⭕🔵⭕⭕  🔵🔵🔵🔵🔵
🔵🔵🔵🔵🔵  ⭕🔵🔵🔵⭕  ⭕🔵⭕🔵⭕  ⭕⭕🔵⭕⭕
⭕⭕🔵⭕⭕  ⭕🔵🔵🔵⭕  🔵⭕⭕⭕🔵  ⭕⭕⭕⭕⭕

Another set wtih H(min,max)=11,20:

🔵🔵🔵🔵🔵  🔵⭕⭕⭕🔵  ⭕⭕⭕🔵🔵  🔵🔵⭕⭕⭕
🔵🔵🔵🔵🔵  🔵⭕⭕⭕🔵  ⭕⭕⭕🔵🔵  🔵🔵⭕⭕⭕
🔵⭕⭕⭕🔵  🔵⭕⭕⭕🔵  ⭕⭕⭕🔵🔵  🔵🔵⭕⭕⭕
🔵⭕⭕⭕🔵  🔵🔵🔵🔵🔵  ⭕⭕⭕🔵🔵  🔵🔵⭕⭕⭕
🔵⭕⭕⭕🔵  🔵🔵🔵🔵🔵  ⭕⭕⭕🔵🔵  🔵🔵⭕⭕⭕

⭕⭕🔵🔵🔵  🔵🔵🔵⭕⭕  🔵⭕⭕⭕🔵  ⭕⭕⭕⭕⭕
⭕⭕🔵🔵🔵  🔵🔵🔵⭕⭕  ⭕🔵⭕🔵⭕  🔵🔵⭕⭕⭕
⭕⭕🔵⭕⭕  ⭕⭕🔵⭕⭕  ⭕⭕🔵⭕⭕  🔵🔵🔵🔵🔵
🔵🔵🔵⭕⭕  ⭕⭕🔵🔵🔵  ⭕🔵⭕🔵⭕  ⭕⭕⭕🔵🔵
🔵🔵🔵⭕⭕  ⭕⭕🔵🔵🔵  🔵⭕⭕⭕🔵  ⭕⭕⭕⭕⭕
sz3 commented 1 year ago

Ok, word dump time.

Those single pixel regions in the example tilesets are probably not going to survive the analog journey... But I'll say it's tough to articulate what makes a good vs bad tileset -- I don't have a formal definition, it's more like a collection of rules of thumb that inform an intuition.

The rules of thumb might be good to list out, though -- in particular if you're going hunting for functional tilesets. But first, some observations:

  1. errors tend to be systemic, not random. That is -- at least at the sizes we're talking about (8x8) -- the failure modes are pretty predictable.
  2. one system error source: resolution downscaling + upscaling.
  3. another: blurring due to camera focus issues. Autofocus is in play for the android app (maybe a future app will be cleverer), which means that many frames will tend to be blurry. In practice, this tends to act a lot like reducing the image resolution.
  4. colors bleeding from one tile to the next. This tends to be a result of camera issues (see above), but it's worth singling out because it's hard to test for in isolation... I don't really have a good heuristic for how much we can tolerate before bad things happen, but in any case it tends to act as a sadness multiplier (in that if makes the decode more likely to fail, and us sad)

I factor down the above into two basic rules for tiles: A. when evaluating the hamming distance among prospective tiles, we probably ought to shrink each tile to 70/80% of its initial size, then scale back up (blurring in favor of set pixels, since we're optimizing for the computer screen "dark mode" case). The hamming distances of our pristine tileset are not what will be tested in the real world -- the symbols we will be decoding will have all sorts of foul things happen to them by the time we see them.

sz3 commented 1 year ago

Ok, so those are the rules of thumb -- now, here's what I think I know about viable tileset configurations. First, some numbers:

configuration capacity w/ 4 colors in 1000x1000 frame (estimate) pixels per symbol bit (lower is denser)
8x8 * 16 tiles 9408 20.25
6x6 * 4 tiles 10082 24.5
7x7 * 8 tiles 9765 21.333...
5x5 * 4 tiles 13778 18
6x6 * 8 tiles 12602 16.333...
8x8 * 32 tiles 10976 16.2
7x7 * 16 tiles 11718 16
8x8 * 64 tiles 12544 13.5

I've listed the configurations I consider plausible. (obviously 8x8*4bit is better than 'plausible")

It's sorted by the "pixels per symbol" metric, and coincidentally I would consider that mostly to be in order of easiest to hardest. The difficulty isn't all locked into how tightly the symbols are packed -- there are hidden negatives for the smaller configurations in that they have many more cells, and thus many more places to suffer a catastrophic decoding failure. Also, their bigger capacity numbers mean they rely more heavily on the color decode working.

Anyway, in regards to 5x5 and 3 symbol bits... The short version is I don't consider it worth spending time on (in any case I won't be spending time on it). I can't prove it's futility, but I feel pretty confident about it. The ones on the list I think can work -- except maybe 8x8*64This might be worth a separate discussion, but it's contingent on finding the right tileset -- which seems hard.

sz3 commented 1 year ago

Final thought of the day:

The "printed cimbar code" note is probably worth its own discussion. That use case breaks (in a useful/good way) some key assumptions that constrain a lot of the project design choices. For one, decode performance is probably less of a concern. For two, the screen-induced resolution nightmares might make tileset selection simpler (i.e. maybe "clean" hamming distance is good enough?). But also, it's not clear to me how a hypothetical 64-symbol or 128-symbol or 256-symbol "paper only" cimbar variant would stand up to, say, a bunch of base85 text in a tiny font that uses OCR for decoding.

But if it's competitive, it sounds interesting...

mihaigalos commented 1 year ago

First, some numbers

Ok. If your intuition about these figures is correct, seems 5x5 is the best choice here. Let's see how good they perform with 2 bits as a start.

But also, it's not clear to me how a hypothetical 64-symbol or 128-symbol or 256-symbol "paper only" cimbar variant would stand up to, say, a bunch of base85 text in a tiny font that uses OCR for decoding.

For me, what would make it clear is the Hamming distance + your empirical observations pertaining to shapes above^. I haven't measured the baseXX tiles but I do suspect they are not optimized for this metric, as writing came about before OCR (and cimbar is a form of OCR on a restricted tile set). :sweat_smile: Would it work? Potentially. Would custom tiles, Hamming-optimized, produce less error rates? Only the measurements can say.

mihaigalos commented 1 year ago

One more thing: do you have an intuition for the choice of colors in the 3-bit color space? 🤔

sz3 commented 1 year ago

Not really -- at least, nothing I'm confident about. I'm hoping to take a more rigorous approach to improving the 4 color decode, and perhaps in the process some rules will fall out.