Another rust implementation

datdenkikniet commented 1 year ago

I saw this as a good opportunity to try and do some math-related stuff with Rust :D

I had missed #6 before I got started on it and got kind of carried away.

It does pretty much the same thing as the python implementation, but a bit faster, and for now without RLEs.

Also, since we're solving for a known amount of dimensions (3), I just went with a fixed-size array implementation. This also semi-forced me to implement transpositions and such, which was a good challenge (note: axis[0] != a1, raah)

So far it seems to do what it should, yielding the correct numbers for up to n=13 (n=13 calculation took 2 hours and 10 minutes on a Xeon E5-2630v3, haven't kept track of memory usage).

Possible improvements:

[x] Parallelize computation (using rayon. Note: impl might be faster by collect-ing on sublists, but that eats up a lot of memory)
[x] Add serde + load and save from cache
[ ] Add RLE (for completeness)
[ ] Somehow add something like Cow to the mix and avoid cloning for flip and transpose operations that don't need it
[x] ~~Maybe: add interop with numpy format (crude impl suffices)~~ Won't do, pcube format is easier

bertie2 commented 1 year ago

if you are considering implementing save loading consider implementing this file format: https://github.com/mikepound/opencubes/issues/8#issuecomment-1634507915 rather than directly implementing numpy. there's included converters in the linked branch to get it to numpy format

datdenkikniet commented 1 year ago

if you are considering implementing save loading consider implementing this file format: #8 (comment) rather than directly implementing numpy. there's included converters in the linked branch to get it to numpy format

Thanks for the heads up! Have implemented loading from and saving to a file now, quite nice :D

datdenkikniet commented 1 year ago

I have now added a bit of validation to the command as well. Running cargo run -- validate <name> verifies that you can actually use the file as a cache

Which immediately shows that my implementations of pack or unpack are broken :'(

Edit: huh, seems like BufReader does not work the way I thought it did...

bertie2 commented 1 year ago

@datdenkikniet I have muted this for now because it still seems under very active development, but i'm still very interested as it seems to be the most feature complete rust implementation at this time, please give me a poke once you are happy with it, and i will review for merge.

NailLegProcessorDivide commented 1 year ago

I might try to merge my solutions as sub commands then to allow for 1 rust solution with multiple strategy and optimisations implemented unless anyone thinks that's a bad idea. It would also allow me to take advantage of @datdenkikniet's existing cache file saving and loading which I haven't implemented yet.

datdenkikniet commented 1 year ago

I'm not sure if the way I've set up my PolyCube lends itself to the speed of your implementation, but if you're willing to see if wrangling my impl into something a bit more performant (n=13 takes ~50 mins on a slow-ish 32 core machine now), that would be absolutely awesome :D

Alternatively just as an entirely separate implementation behind some cli flags that we merge later, which feels like an awesome in-between solution

datdenkikniet commented 1 year ago

I've also been thinking about how we could get around the memory problem, and figured we should try having a HashSet<u64> that only stores the hash of the cube, and storing the actualy polycube to disk immediately. Not entirely sure how that works out collision wise, but might be worth it :D

I plan to test it out by using that approach in pcube validate, to see the difference in memory usage.

bertie2 commented 1 year ago

I've also been thinking about how we could get around the memory problem, and figured we should try having a HashSet<u64> that only stores the hash of the cube, and storing the actualy polycube to disk immediately. Not entirely sure how that works out collision wise, but might be worth it :D

problem is if you're hash is actually a hash (and not just a representation like the bits of the cube arranged as a 1d array of bytes), you still need to do an equality check on the value to check whether these are actually the same cube or just a hash collision.

if you "hash" is just a representation like it is in my python implementation, your best off making a function to reverse it back from the representation to the cube data structure, then you can just store the "hash", and reverse it back out later.

NailLegProcessorDivide commented 1 year ago

I was thinking mainly keeping it to cli flags for now with some shared code/structs between implementations and conversion functions between polycubes. Mine already has 2 implementations:

one that operates on the 3d bit mask and stores them as point lists for memory reasons.
one that operates on a list of points and is lazily parallelised (25%+ is still running on 1 thread because of how I divide the work and another 30% of time looks to be mutex fighting).

I still also want to try a 3rd implementation bypassing the hash table and memory limit entirely but I'll have to see if it works and the performance is reasonable (to get to n=14+ without me being impatient)

NailLegProcessorDivide commented 1 year ago

my 3rd impl would be using this suggested algorithm https://github.com/mikepound/opencubes/issues/11 but i expect it to be at least an order of N times slower best case if I even make it work

datdenkikniet commented 1 year ago

@bertie2 I think merging this PR now is fine. It has most of the basics and does the important part (enumerate up to n=13 :P). That way we can more easily write different changes against it and have more people contributing.

Is there anything blocking such a merge? Do we need more docs on how to run stuff?

bertie2 commented 1 year ago

@datdenkikniet in terms of functionality and tests all seems good and it runs on my machine at least.

I'm not familiar with rust, so my only real question is would it be possible to split out some of the functionality into multiple relevantly named files, rather than a few monoliths. please let me know and do so if you can.

finally, there is a blocker on my end that I need to move the python implementation into a sub directory, and update the README which I will get done tomorrow, the C++ implementation will be getting merged around the same time as this one so it was already on the todo list to refactor.

datdenkikniet commented 1 year ago

Sweet :) Yes, I'll split up some stuff and add some more doc comments!

NailLegProcessorDivide commented 1 year ago

+1 for merging the code looks all reasonably good with enough documentation.

I would consider moving loading the cache files from unique_expansions to enumerate and call my implementations from there.

Moving files I'd consider moving unique_expansions out of main and into lib and or moving most of lib into a module with a more meaning full name name.

Im not experienced enough at rust to say if rotation and flip can go in theyre own file. My implementations have a module for rotations that takes a reference as the first parameter and returns a new PolyCube but that makes them not impl methods anymore

datdenkikniet commented 1 year ago

I've done some moving around and renaming.

Personally I think flip, transpose and rot90 can stay in polycubes.rs, but if we want to we can definitely give them the same treatment as the iterators.

mikepound / opencubes

Another rust implementation #13