mikepound / cubes

This code calculates all the variations of 3D polycubes for any size (time permitting!)
MIT License
165 stars 42 forks source link

Make Hashing Algorithm Faster (but dumber) #4

Open bertie2 opened 1 year ago

bertie2 commented 1 year ago

Hi just wanted to contribute with the first of a few optimizations I have made.

Please let me know if you actually want this kind of contribution before I raise more of these, namely I will be focusing on more runtime optimizations and parallelism than on the pure math's question of how to sort out the cube rotations.

as such this isn't a pure math optimization, in fact it technically creates more data, but it does more of the work in numpy in C and so is much quicker.

instead of hashing with your RLE, we convert the bits straight into unsigned ints with np.packbits, then we concatenate these ints into a single python int object for the hash.

finally, we add the dimensions of the array to the lowest part of the integer so we don't lose that information.

benchmark attached for n=10 (absolute time is irrelevant as hardware dependent but their is a 2x speedup from the old code)

PS C:\Users\georg\orig_cubes> python.exe .\cubes.py --no-cache 10 Generating polycubes n=3: 100%
Generating polycubes n=4: 100% Generating polycubes n=5: 100%
Generating polycubes n=6: 100%
Generating polycubes n=7: 100%
Generating polycubes n=8: 100%
Generating polycubes n=9: 100%
Generating polycubes n=10: 100%
Found 346543 unique polycubes Elapsed time: 450.932s

PS C:\Users\georg\orig_cubes> python.exe .\improved_hash.py --no-cache 10 Generating polycubes n=3: 100%
Generating polycubes n=4: 100% Generating polycubes n=5: 100%
Generating polycubes n=6: 100%
Generating polycubes n=7: 100%
Generating polycubes n=8: 100%
Generating polycubes n=9: 100%
Generating polycubes n=10: 100%
Found 346543 unique polycubes Elapsed time: 230.274s

unfortunately since this is speeding up hashing itself not the searches, it will only be a linear speed up so will only push us slightly higher on maximum N.

bertie2 commented 1 year ago

implemented parallelism and threw more cores at the problem, each parallel thread is much slower, but more cores are more cores n=10 benchmark for the parallelism run:

PS C:\Users\georg\orig_cubes> python.exe .\paralel.py --no-cache 10 Generating polycubes n=3: 100%
Generating polycubes n=4: 100%
Generating polycubes n=5: 100%
Generating polycubes n=6: 100%
Generating polycubes n=7: 100%
Generating polycubes n=8: 100%
Generating polycubes n=9: 100%
Generating polycubes n=10: 100%
Found 346543 unique polycubes Elapsed time: 64.173s

mikepound commented 1 year ago

Very nice :) I'm fairly happy with these kinds of optimisations, with only a few caveats right now:

1) I clearly didn't think through what would happen when I post code on a video and get spammed with pull requests :) Realistically I can keep an eye on this for a few days, but probably not long term 2) I'd like to keep an original copy of the file either in a branch or alongside any changes for people watching the video for the first time 3) Any merges are likely to interfere with all the other pull requests and comments, again I've not thought this through really!

bertie2 commented 1 year ago

that's fair, depending on how things go (I suspect this should calm down eventually once the obvious ideas have been had and implemented) it might be worth spinning up another repo for people to colab in which is separate to this one linked in the video, appointing some maintainers, and muting it, then you can let the community drive it and we can stop spamming you :)

for now i'm going to keep working on this in my own repo, but wont be posting anything else here except for the npy files to hopefully give you a bit of peace.

mikepound commented 1 year ago

This is a great idea! I may do this tomorrow if I have time :)

mikepound commented 1 year ago

that's fair, depending on how things go (I suspect this should calm down eventually once the obvious ideas have been had and implemented) it might be worth spinning up another repo for people to colab in which is separate to this one linked in the video, appointing some maintainers, and muting it, then you can let the community drive it and we can stop spamming you :)

for now i'm going to keep working on this in my own repo, but wont be posting anything else here except for the npy files to hopefully give you a bit of peace.

I've created a new repo here: https://github.com/mikepound/opencubes

Are you interested in helping maintain this? Github seems to have adapted their UI again - I don't use it much. I think I invite people as collaborators first, then promote to maintainer?

bertie2 commented 1 year ago

that's fair, depending on how things go (I suspect this should calm down eventually once the obvious ideas have been had and implemented) it might be worth spinning up another repo for people to colab in which is separate to this one linked in the video, appointing some maintainers, and muting it, then you can let the community drive it and we can stop spamming you :) for now i'm going to keep working on this in my own repo, but wont be posting anything else here except for the npy files to hopefully give you a bit of peace.

I've created a new repo here: https://github.com/mikepound/opencubes

Are you interested in helping maintain this? Github seems to have adapted their UI again - I don't use it much. I think I invite people as collaborators first, then promote to maintainer?

happy to maintain this, though I fear my usefulness is coming to an end, the scaling memory costs are already becoming a hinderance to my changes at n=14.

bertie2 commented 1 year ago

also yes once you have invited someone as a collaborator you should be able to set their permissions, though i'm used to private repos on github business and am not sure if their is some arbitrary limits on open projects.

mikepound commented 1 year ago

Oddly it doesn't seem to have any permissions at all, could simply be free vs paid features. I think you have access, so maybe worth pushing something and seeing if it works!

bertie2 commented 1 year ago

I can confirm I can push, the real question will come when a pull request or issue comes up, I will see what my options are there. Thanks.

RibenaJT commented 1 year ago

returning the array + dimensions in byte format (as opposed to shifting bits) makes this about 15% or so faster

return polycube.tobytes() + polycube.shape[0].to_bytes(1) + polycube.shape[1].to_bytes(1) + polycube.shape[2].to_bytes(1) # add volume dimensions to lower bits of hash

bertie2 commented 1 year ago

returning the array + dimensions in byte format (as opposed to shifting bits) makes this about 15% or so faster

return polycube.tobytes() + polycube.shape[0].to_bytes(1) + polycube.shape[1].to_bytes(1) + polycube.shape[2].to_bytes(1) # add volume dimensions to lower bits of hash

development of this is now continuing over at https://github.com/mikepound/opencubes but yes your way is about 15% faster (as in in terms of impact on the total runtime which means its a lot faster on its own) and more concise and I will be trying to merge it when the rest of my changes go in.