Closed sogaiu closed 1 year ago
I actually intended to add three primality functions to spork/math
:
(math/primes)
a boundless prime number generator
(math/prime? n)
a primality test, deterministic for n < 2^64
(math/next-prime n)
next prime strictly larger than n
I was unable to implement the last two, for want of a modular exponentiation which avoids overflow, which I intended to implement first. I suppose it wouldn't hurt to add math/primes
on its own, though.
I learned recently that some hash function constants are computed from some primes so may be there are other use cases too?
Most applications would exceed 64-bit arithmetic, and would need a GMP wrapper, which is also on my bucket list 😉.
Work in progress: https://gist.github.com/primo-ppcg/31a31ecede0296457b6c57fe1f0c8020
There's a few things I'd like to clean up yet, lucas-prp?
would be much cleaner implemented recursively, and I'd really like to have mulmod
and powmod
in a C module, as they're used pretty much everywhere.
It'd also be great to have a good home for this:
(defn binary-search
``Returns the index of `x` in a sorted array or tuple
or the index of the next larger item if `x` is not present.
This is the correct insert index for `x` within `arr`.``
[x arr]
(var start 0)
(var end (length arr))
(var mid (brshift end 1))
(while (not= mid end)
(if (< (in arr mid) x)
(do (set start mid) (set mid (brshift (+ start end 1) 1)))
(do (set end mid) (set mid (brshift (+ start end) 1)))))
mid)
I need it for both prime?
and next-prime
, but it doesn't really belong in a math module.
Re: binary-search
I know of a few things in similar territory:
So evidence of a common need, perhaps?
Not sure where it would go, but may be others have some ideas about that.
Re: have mulmod and powmod in a C module
Since spork has some C code already, if considering inclusion for spork, may be it's not a problem to do similarly? Though, not sure of the criteria for adding C code to spork.
Not sure where it would go, but may be others have some ideas about that.
misc
seems most appropriate, given the presence of insert-sorted
, which should definitely be using a binary-search
, O(n log n) instead of O(n²).
Added: #143
That seems reasonable as misc
also has dfs
, randomize-array
, and insert-sorted-by
which seem thematically related -- at least to me :)
Looking through the rest of misc
, I wonder if it's still doable to split some stuff off and leave "aliases" that point to new locations...but that should probably be another discussion.
Since spork has some C code already, if considering inclusion for spork, may be it's not a problem to do similarly? Though, not sure of the criteria for adding C code to spork.
It's not so hard to add as a separate module: https://gist.github.com/primo-ppcg/b915d352dbe5cc2f338587d199e2ba1f I don't know if it's possible or intended to implement a module partially in C, and partially in Janet.
I don't know if it's possible or intended to implement a module partially in C, and partially in Janet.
One way is to have some obfuscated name for the C module, then import and export it in the Janet module https://github.com/andrewchambers/janet-uri/blob/master/uri.janet#L83
Perhaps miniz
got included in spork because it was small enough and the license worked.
May be similar reasoning could apply to libbf?
Seems like a bit of overkill for what I'm doing, although @andrewchambers' big
looks like a great candidate for inclusion.
The license for libbf is also MIT, so there shouldn't be an issue with that.
One way is to have some obfuscated name for the C module, then import and export it in the Janet module https://github.com/andrewchambers/janet-uri/blob/master/uri.janet#L83
I'm not sold on the idea, as the obfuscated module can still be imported. The most proper solution seems to be to embed the compiled Janet source within the C module, although the documentation recommends against this: https://janet-lang.org/docs/jpm.html#Creating-a-native-module
I'm not sold on the idea, as the obfuscated module can still be imported.
Curious why this is a concern.
Curious why this is a concern.
I don't know that it is a problem. It just seems somewhat inelegant and/or unintended. In particular, the auto-generated documentation would list it, right?
I do think that C implementations are warranted, though. I've created a fork to test this out, using this gist to time primality checking of primes between 2^40 and 2^52.
With cfuns:
Elapsed time: 2.019s, 66.78µs/body
Without C funs (remove the -
after each of invmod-
, mulmod-
, and powmod-
in the gist file):
Elapsed time: 2.193s, 507.8µs/body
Which is something like 7 times faster.
In particular, the auto-generated documentation would list it, right?
Oh, I don't know how the docs work (^^;
Do you mean these files?
Wasn't quite as dramatic here, but I did notice a significant difference:
$ JANET_PATH=$(pwd)/jpm_tree/lib janet primes.janet
Elapsed time: 2.015s, 217.7µs/body
$ JANET_PATH=$(pwd)/jpm_tree/lib janet primes-without-cfuns.janet
Elapsed time: 2.250s, 607.7µs/body
Wasn't quite as dramatic here
Had to double check to make sure I wasn't crazy, but I guess your mileage may vary. I also moved jacobi, not because it's time critical, but because it can be simplified a lot with bitwise operators, which can't be used in Janet for values that exceed int32 range. And I think by doing so, it should be possible to allow the primality functions to work with inttypes without a dramatic decrease in performance.
EDIT: Actually, I've removed the bpsw test entirely, and just use a deterministic rabin-miller, so jacobi is no longer required (fork). Although bpsw is a bit faster, the lucas test caused overflows at the bit limits due to additions. I also suspect that rabin-miller could be made faster with better base selection.
Updated gist. As it is now, number
types are valid all the way to 2^53, and int types are valid to 2^63. Getting pretty close to PR ready, I think
For comparison purposes, here are the times I got with the latest:
$ JANET_PATH=$(pwd)/jpm_tree/lib janet primes-2023-07-22.janet
Elapsed time: 0.157s, 15.69µs/body
Elapsed time: 0.061s, 94.41µs/body
Elapsed time: 0.113s, 176.2µs/body
Elapsed time: 0.056s, 270.9µs/body
Elapsed time: 0.052s, 255.5µs/body
I'll leave some links here for future readers (including a future me):
Elapsed time: 0.061s, 94.41µs/body
That 94.41 is the same test as the 217.7 you reported earlier, strangely. I also found that the RM base selection was using an improper comparator. I've updated the gist, so the last 3 checks should be a lot faster now.
EDIT: Actually, just always using these bases:
[2 325 9375 28178 450775 9780504 1795265022]
is faster than selecting the bounds from small primes. I'm probably going to spend the next few days crunching the bounds on these.
That 94.41 is the same test as the 217.7 you reported earlier, strangely. I also found that the RM base selection was using an improper comparator. I've updated the gist, so the last 3 checks should be a lot faster now.
It's possible there was some pilot error here (^^;
I've fetched the updated gist (revision 12), freshly cloned your spork fork, checked out the modular-arithmetic branch, locally built, locally installed, and executed the gist:
$ JANET_PATH=$(pwd)/jpm_tree/lib janet primes-2023-07-22-later.janet
Elapsed time: 0.181s, 18.09µs/body
Elapsed time: 0.073s, 114.2µs/body
Elapsed time: 0.083s, 129.6µs/body
Elapsed time: 0.042s, 203.5µs/body
Elapsed time: 0.041s, 198.1µs/body
...and it does look like the last 3 are faster.
@sogaiu I think you can close this one now :wink:
@primo-ppcg Thanks a lot for your work on this!
Is there any interest in including a prime number generator?
I don't know about generating this one, but I learned recently that some hash function constants are computed from some primes so may be there are other use cases too?
I wonder if @primo-ppcg's gist with a fiber-based one would be a candidate...