mikepound / opencubes

A community improved version of the polycubes project!
MIT License
44 stars 24 forks source link

Rotation invariant representation for 2D shapes #25

Open Spacha opened 11 months ago

Spacha commented 11 months ago

I have been tinkering around to find a rotation invariant representation for the shapes, and so far it seems that I have found one for at least 2D. I am now trying to extend this to 3D.

My idea was inspired by Local Binary Patterns (LBP). It is used in machine vision to extract features from an image.

The idea

To represent a shape in a rotation invariant way, the encoding should be based on relating all the cubes to all the other cubes in the shape, instead of some reference frame around the shape (because we have no way of fixing that reference frame). This makes it so that it doesn't matter in which order we compute and we still get the same result.

We first figure out a way to represent all distinct Moore neighborhoods of a cell. In 2D, there are 2^9 = 512 possible distinct Moore neighborhoods (the central cell plus all the 8-neighbors).

Then we calculate the distinct cell values for all the cells in the shape (including the empty cells) to get a feature matrix. That matrix should be unique for all shapes, but it is the same for rotated variants of that same shape.

Example in 2D

Say, we have a shape like this:

image

1. Add zero padding for calculation

We start by adding 2 levels of zero padding around the shape. One level is needed because the feature matrix is always larger than the shape (since all neighbors affect the cell value), and one level is needed to make sure we don't try reading outside the matrix when calculating the cell values.

image

2. Calculate the cell values using modified LBP

Then we can start calculating the feature matrix. We start from cell (1, 1), the second cell of the second row (so that there are not edges next to it).

To understand how the cell value is calculated, see the figure. The cell (1, 1) is highlighted in orange and its Moore neighborhood is enumerated from 1 to 9 (starting from top-left, spiraling into the center):

image

Since there are 512 distinct Moore neighborhoods, 9 bits is sufficient to represent all of them. We start setting the bits one by one, starting from the MSB:

  1. Start from the top-left neighbor. If there is a block (=green), set the first bit (MSB) of the cell value to 1, otherwise 0.
  2. Then proceed to the top-mid neighbor. Again, if there is a block, set the second bit of the cell value to 1.
  3. Repeat for the top-right neighbor and set bit position 3 accordingly.
  4. Repeat for the mid-right neighbor and set bit position 4 accordingly.
  5. Repeat for the bottom-right neighbor and set bit position 5 accordingly.
  6. Repeat for the bottom-mid neighbor and set bit position 6 accordingly.
  7. Repeat for the bottom-left neighbor and set bit position 7 accordingly.
  8. Repeat for the mid-left neighbor and set bit position 8 accordingly.
  9. Repeat for the center cell (the cell itself) and set bit position 9 (LSB) accordingly.

Now we have a 9-bit cell value for the first cell: 00001_0000, which is 16 in decimal. We can then move to the next cell. This is repeated for all cells except cells that are on the edge (so skipping the first and last columns as well as the first and last rows; they would be zeros anyway).

Here are all the cell values for the example shape:

image

Then, the feature vector of this shape is:

[[ 16  24  28  12   4   0]
 [ 32  33  51  27  14   4]
 [ 64 192 496 425 263   2]
 [  0   0 112 201 390 256]
 [  0   0  96 129 258   0]
 [  0   0  64 128 256   0]]

3. Calculate the sum of the matrix values

Finally, we sum all the cell values in the feature matrix to get the fingerprint of that shape. This fingerprint is the same for all rotations of the same shape.

Comparing the variants

Here are all the variants of the example shape, along with their feature matrices and their sums (the fingerprints).

image

Matrix:
[[ 16  24  28  12   4   0]
 [ 32  33  51  27  14   4]
 [ 64 192 496 425 263   2]
 [  0   0 112 201 390 256]
 [  0   0  96 129 258   0]
 [  0   0  64 128 256   0]]

Sum: 3577

image

Matrix:
[[  0  16   8   4   0   0]
 [ 16  56  29  30  12   4]
 [ 48 105 167 291   3   2]
 [112 201 454 448 384 256]
 [ 96 129 258   0   0   0]
 [ 64 128 256   0   0   0]]

Sum: 3577

image

Matrix:
[[  0  16   8   4   0   0]
 [  0  48   9   6   0   0]
 [ 16 120 141 262   0   0]
 [ 32 113 155 286  12   4]
 [ 64 224 417 291   3   2]
 [  0  64 192 448 384 256]]

Sum: 3577

image

Matrix:
[[  0   0   0  16   8   4]
 [  0   0   0  48   9   6]
 [ 16  24  28 124 141 262]
 [ 32  33  51 107 135 258]
 [ 64 192 480 449 386 256]
 [  0   0  64 128 256   0]]

Sum: 3577

As you see, all variants have the same fingerprint 3577!

Final words

As of yet, I have not been able to prove or disprove this method, but so far it seems very promising. I have tested it on tens of various shapes in 2D, as you can see in my repository.

I want to prove that this method actually works and never gives the same fingerprint to 2 different shapes, then proceed to applying this to 3D, if possible.

NailLegProcessorDivide commented 11 months ago

Unless Im miss understanding something does this not always respond with 511 (0b1_1111_1111) times the number of squares?

Because each set square contributes each bit once to another square as you do the convolution and then you sum all the numbers at the end recombining the bits?

Spacha commented 11 months ago

Ah, you are right, damn! The sum is basically counting the set blocks in the shape... I was a bit suspicious about using the sum as the fingerprint, but it seemed to work 🙃

I wonder if we could use some other property to match the feature matrices.

NailLegProcessorDivide commented 11 months ago

it would certainly be useful. there might even be a lot of value in just a rotation invariant hash but im not 100% clear on how to use one if we had one to save wasting time performing a bunch of rotations when we can be quickly sure theyre different.

Maybe taking the mask of 3x3s from above or 2x2x2 for 3d as 27bits is quite a big key, converting each to a canonical rotation through a table, then counting how many of what patterns there are and generating a hash from that but I havent put that much thought into it