Open dzamlo opened 1 year ago
That's really cool!
Do you think it's possible to parallelize this solution? From a hunch it feels like it should be quite parallelizable given that it's all math-based and stuff :D
Also: what does it use nightly for? I think using lazy_static
will make it compile just fine on stable. If it could be a bit more idiomatic that would also be nice, now it just looks like a C-project in a rust jacket :P
Do you think it's possible to parallelize this solution?
I thinks it's quite easy at the cost of some cloning.
what does it use nightly for?
For Lazylock, that could be replaced by lazy_static, and for some stuff in the const fn
, that could also be replaced by lazy_static but the perf should be measured
If it could be a bit more idiomatic that would also be nice, now it just looks like a C-project in a rust jacket
I agree, I ported the code from the java version and I tried to be idiomatic but it's hard when you port code you don't fully understand. Also code in const fn
has to be none idiomatic to compile (no for loop for example).
I would gladly make improvement to the code base but I'm waiting for my new laptop as my current one just died.
Hi @dzamlo and @datdenkikniet,
I feel I should add my two cents. It is parallelizable and I already implemented something that will divide the problem into multiple batches that could be run separately. I already got it solving the N=17 case, and if my 9 year old laptop doesn't need a restart or doesn't get so hot that I will be compelled to turn it off, it will take around 2 to 3 weeks to complete.
Unfortunately, the code that makes it parallelizable is a little hard to understand, and I'll need to make a README and add features to make it more understandable (I might even add a GUI, but I'm not sure)
I'll try to find time to get it done. I'm a bit busy and I only have around 1-3 hours of time to work on it every day. Here's the current location of my code: https://github.com/hidny/MikeTsCubeCode/tree/main
By the way, I feel like you nailed it when you said: "it just looks like a C-project in a rust jacket :P". I was thinking that I should port my code into C because my code felt like C code in a Java jacket :P.
As a heads up, it seems like @NailLegProcessorDivide has taken up the challenge of this approach as well in #28 :D
Very cool, Im going to need to see if I can find time to take a look at how this compares to my current implementation.
My program finally finished going through all the solutions for N=17, and it found 457409613979 solutions (If you want it formatted, it's: 457,409,613,979) https://github.com/hidny/MikeTsCubeCode/blob/main/src/OutputReader/GetNumSolutionsFromCubeCountOutputForNCubes.java
Unfortunately, the README doesn't really explain how/why the algorithm works in great detail. I still have to work on the arguments/documentation for why the algorithm works. I also think there's a better way to get the # of solutions. In the next month or two, I'm hoping to clean up the docs for this algorithm, and maybe have a better/faster algorithm.
I think it'd be quite cool to get the Rust version by @dzamlo into this tree somehow :) It really is amazingly fast.
Also, I've created a PR that makes it even faster! https://gitlab.com/dzamlo/polycubes2/-/merge_requests/1
wow. just started looking at @dzamlo's rust version and agree its extremely fast. would love to see it integrated into this repo some how. ~ 3x the current DFS solution in this repo and around 4.5x with @datdenkikniet's pr at least for running N=13
I did some more optimisations after @datdenkikniet's patch reusing already allocated structures (small win) and swapping the Vec3D<bool>
s with a separate Vec3dBool
that stores an fixed array of 64x64 u64s to pack the bools to try to minimise the memory foot print while minimising the cost of the multiplies to calculate array offsets as they go from muls to shifts.
This took the run time for n=13 down further from 22s to 17.5s on my machine. Im trying to run N=17 which I think should run on my machine within a day to see if I broke anything at larger sizes.
Nice! Any chance you could upload your patch somewhere @NailLegProcessorDivide? Would love to check it out!
Also: you're running an R9 7950X, right?
r9 7900X so "only" 12c/24t. I'll see if I can make a gitlab to push with. the code is a bit nasty and I dont like that it hardcodes the structure size but I would need to figure out a way to go through and make it variable but keeping the speed might be hard. also making the Vec3dBool "wrap" on itself would I think be safe but even anding every coordinate with 0x1f made the code ~2x slower.
https://gitlab.com/dzamlo/polycubes2/-/merge_requests/2 hopefully that link should work
I forgot to mention there was quite a bit of perf to gain in unsafely accessing the Vec3dBool as well. making vec bool 32x32x32 was a fair bit faster but is too small for N>14
number of polycubes for n=17: 457409613979, 74301822 ms
real 1238m21.828s
user 28524m59.824s
sys 9m41.177s
Finally, N=17 in a bit over 20.5hrs getting the same number as @hidny. a small amount could be saved with better work distribution across threads and any form of feed back beyond 100% utilisation for 19.5 hours would be nice to have .
I'm chipping away at getting it to compile on stable #3, and even more(!!!) performance by @NailLegProcessorDivide #2.
@dzamlo I would also like to inquire if you plan to slap an explicit license on the source so that it is clearer if/how we may incorporate it here in the future. Currently it's "unlicensed" which, AFAIK, means that all rights on the code are reserved to the original author (you).
It's derivative work from the code from @hidny . So I cannot relicense my code if the code from @hidny doesn't have a permissive license that allow derivative work
@hidny : could you relicense your work with a permissive licence like MIT?
I added the MIT license to the repo: https://github.com/hidny/MikeTsCubeCode/blob/main/LICENSE
I've made a fairly naive networked solution at https://gitlab.com/dzamlo/polycubes2/-/merge_requests/4 @dzamlo if you could update your repo to an MIT or other permissive licence as well it would be nice to move the fastest solution into this main repo, assuming you are happy for this to happen
I've updated the license. Sorry for the delay.
Thanks! no worries about the delay.
@hidny has come up with a solution where memory usage isn't the limitation. It also appears faster. You can find my implementation in https://gitlab.com/dzamlo/polycubes2
Unfortunaly, I cannot make a pull request because my laptop died.