mikepound / opencubes

A community improved version of the polycubes project!
MIT License
44 stars 23 forks source link

Optimizations from guy that did n=16 #5

Open joulebit opened 1 year ago

joulebit commented 1 year ago

Here is the paper from the guy calculating to n=16 in 2005 in 11 days: http://kevingong.com/Polyominoes/ParallelPoly.html

Something very interesting he wrote about how to keep the hashtables small and more dividable

4.1.2 Hashing

One of most fundamental tasks of the polyomino program is to support uniqueness. We have to ensure that a polyomino has not already been counted. To do this, we store all of the unique polyominoes in a hash tree/table structure, and compare the candidate polyomino with only a small subset of the unique polyominoes. To access the hash structure, we first branch on some criteria(on) to find the correct hash table, then index into the hash table. If we compare the candiate polyomino only with polyominoes of the same height and width, this narrows down the search considerably. We can also branch on the y-coordinate of the last square. This seems a bit strange, but it works well. (See Figure 4.1.2-1) To narrow it down even further, we use a hash function. This takes the XOR of every n bits, where n = height-1. (See Figure 4.1.2-2) This is not necessarily the best hash function, but it is simple to implement, and gives fair results, as we will see.

ClaasF commented 1 year ago

While watching the video, I wondered, if there's not a simpler way to canonicalize a representation, which would result in almost trivial hashing.

Each cube is essentially a string of 1s and 0s, just packed a bit weirdly (I don't know, how efficient numpy is here). So you could interpret each cube simply as a number.

Example in 2D:

Given the map ` 0 1 0

1 1 0

0 1 0 ` (edit: I have no idea, how to format this properly, but I think you get the idea)

you might interpret that as the binary "010110010". Each of the rotations should have a different binary representation (unless it's symmetrical, in which case it doesn't matter).

What I would call the "canonical" representation of a polycube is simply the rotation with the smallest binary value.

mikepound commented 1 year ago

This are really interesting ideas - I didn't read around much when I implemented mine in front of the TV! Actually I had in mind a video where it sparked some discussions, so in some sense my "do it badly" approach has been a success!

The speed of hashing and the speed of lookup are a concern. I certainly think the encoding / hashing could be improved, I also am not sure what the maximum capacity of the python set is. If it's small, then there'll be an increasing number of collisions that will massively slow down lookup. In this case splitting the problem on some trivial condition and having multiple hashes may be a lot more efficient.

Binary representation would also speed up hashing a lot. On my mind when I didn't do this, was the number of bits required to hold the biggest possible polycube for some n. For example what n means that we now have polycubes more than 32 bits? Python actually uses arbitrary precision integers, but i'd imagine these may take a performance hit.

ClaasF commented 1 year ago

Binary representation would also speed up hashing a lot. On my mind when I didn't do this, was the number of bits required to hold the biggest possible polycube for some n. For example what n means that we now have polycubes more than 32 bits? Python actually uses arbitrary precision integers, but i'd imagine these may take a performance hit.

Since it's just plain and simply hashing anyway, that shouldn't be too bad. One could even just use the pure String (capital S) and hash that. If the Bitcoin hype taught us anything it's that modern hardware is really good at hashing.

ClaasF commented 1 year ago

I just played around with the binary representation and turns out, it's not that simple:

cube 1:

[[[0] [1]]

[[1] [1]]]

cube 2: [[[0 1]]

[[1 1]]]

And flattened:

[0 1 1 1] [0 1 1 1]

So two very different sequences have the same signature - if one uses that simple approach, at least. grafik

ClaasF commented 1 year ago

Update number 2: I found a relatively simple, but not exactly optimal solution: Padding the array with zeros:

def sig(polycube):
    polycube = np.pad(polycube, 1, 'constant', constant_values=0)
    packed_bytes = np.packbits(polycube)
    return int.from_bytes(packed_bytes)

def canonical_sig(polycube):
    sigs = [ (sig(cube_rotation), cube_rotation) for cube_rotation in all_rotations(polycube)]
    return min(sigs, key=lambda tup: tup[0])

This does yield the correct result, but padding is apparently really expensive, so it takes almost twice as long.

Not exactly the result I hoped for...

prophaner commented 11 months ago

I don't know where else to submit my entry. Check it out and see if it helps: https://github.com/prophaner/polycubes/tree/main