janet-lang / spork

Various Janet utility modules - the official "Contrib" library.
MIT License
122 stars 35 forks source link

Missing prime number generator #140

Closed sogaiu closed 1 year ago

sogaiu commented 1 year ago

Is there any interest in including a prime number generator?

I don't know about generating this one, but I learned recently that some hash function constants are computed from some primes so may be there are other use cases too?

I wonder if @primo-ppcg's gist with a fiber-based one would be a candidate...

primo-ppcg commented 1 year ago

I actually intended to add three primality functions to spork/math:

(math/primes) a boundless prime number generator (math/prime? n) a primality test, deterministic for n < 2^64 (math/next-prime n) next prime strictly larger than n

I was unable to implement the last two, for want of a modular exponentiation which avoids overflow, which I intended to implement first. I suppose it wouldn't hurt to add math/primes on its own, though.

I learned recently that some hash function constants are computed from some primes so may be there are other use cases too?

Most applications would exceed 64-bit arithmetic, and would need a GMP wrapper, which is also on my bucket list 😉.

primo-ppcg commented 1 year ago

Work in progress: https://gist.github.com/primo-ppcg/31a31ecede0296457b6c57fe1f0c8020

There's a few things I'd like to clean up yet, lucas-prp? would be much cleaner implemented recursively, and I'd really like to have mulmod and powmod in a C module, as they're used pretty much everywhere.

It'd also be great to have a good home for this:

(defn binary-search
  ``Returns the index of `x` in a sorted array or tuple
  or the index of the next larger item if `x` is not present.
  This is the correct insert index for `x` within `arr`.``
  [x arr]
  (var start 0)
  (var end (length arr))
  (var mid (brshift end 1))
  (while (not= mid end)
    (if (< (in arr mid) x)
      (do (set start mid) (set mid (brshift (+ start end 1) 1)))
      (do (set end mid) (set mid (brshift (+ start end) 1)))))
  mid)

I need it for both prime? and next-prime, but it doesn't really belong in a math module.

sogaiu commented 1 year ago

Re: binary-search

I know of a few things in similar territory:

So evidence of a common need, perhaps?

Not sure where it would go, but may be others have some ideas about that.


Re: have mulmod and powmod in a C module

Since spork has some C code already, if considering inclusion for spork, may be it's not a problem to do similarly? Though, not sure of the criteria for adding C code to spork.

primo-ppcg commented 1 year ago

Not sure where it would go, but may be others have some ideas about that.

misc seems most appropriate, given the presence of insert-sorted, which should definitely be using a binary-search, O(n log n) instead of O(n²).

Added: #143

sogaiu commented 1 year ago

That seems reasonable as misc also has dfs, randomize-array, and insert-sorted-by which seem thematically related -- at least to me :)


Looking through the rest of misc, I wonder if it's still doable to split some stuff off and leave "aliases" that point to new locations...but that should probably be another discussion.

primo-ppcg commented 1 year ago

Since spork has some C code already, if considering inclusion for spork, may be it's not a problem to do similarly? Though, not sure of the criteria for adding C code to spork.

It's not so hard to add as a separate module: https://gist.github.com/primo-ppcg/b915d352dbe5cc2f338587d199e2ba1f I don't know if it's possible or intended to implement a module partially in C, and partially in Janet.

pepe commented 1 year ago

I don't know if it's possible or intended to implement a module partially in C, and partially in Janet.

One way is to have some obfuscated name for the C module, then import and export it in the Janet module https://github.com/andrewchambers/janet-uri/blob/master/uri.janet#L83

tionis commented 1 year ago

Maybe some large ints/floats could also be implemented and directly integrated for this. Perhaps some bindings with libbf like in janet-big

sogaiu commented 1 year ago

Perhaps miniz got included in spork because it was small enough and the license worked.

May be similar reasoning could apply to libbf?

primo-ppcg commented 1 year ago

Seems like a bit of overkill for what I'm doing, although @andrewchambers' big looks like a great candidate for inclusion.

The license for libbf is also MIT, so there shouldn't be an issue with that.

primo-ppcg commented 1 year ago

One way is to have some obfuscated name for the C module, then import and export it in the Janet module https://github.com/andrewchambers/janet-uri/blob/master/uri.janet#L83

I'm not sold on the idea, as the obfuscated module can still be imported. The most proper solution seems to be to embed the compiled Janet source within the C module, although the documentation recommends against this: https://janet-lang.org/docs/jpm.html#Creating-a-native-module

sogaiu commented 1 year ago

I'm not sold on the idea, as the obfuscated module can still be imported.

Curious why this is a concern.

primo-ppcg commented 1 year ago

Curious why this is a concern.

I don't know that it is a problem. It just seems somewhat inelegant and/or unintended. In particular, the auto-generated documentation would list it, right?

I do think that C implementations are warranted, though. I've created a fork to test this out, using this gist to time primality checking of primes between 2^40 and 2^52.

With cfuns:

Elapsed time: 2.019s, 66.78µs/body

Without C funs (remove the - after each of invmod-, mulmod-, and powmod- in the gist file):

Elapsed time: 2.193s, 507.8µs/body

Which is something like 7 times faster.

sogaiu commented 1 year ago

In particular, the auto-generated documentation would list it, right?

Oh, I don't know how the docs work (^^;

Do you mean these files?


Wasn't quite as dramatic here, but I did notice a significant difference:

$ JANET_PATH=$(pwd)/jpm_tree/lib janet primes.janet 
Elapsed time: 2.015s, 217.7µs/body

$ JANET_PATH=$(pwd)/jpm_tree/lib janet primes-without-cfuns.janet 
Elapsed time: 2.250s, 607.7µs/body
primo-ppcg commented 1 year ago

Wasn't quite as dramatic here

Had to double check to make sure I wasn't crazy, but I guess your mileage may vary. I also moved jacobi, not because it's time critical, but because it can be simplified a lot with bitwise operators, which can't be used in Janet for values that exceed int32 range. And I think by doing so, it should be possible to allow the primality functions to work with inttypes without a dramatic decrease in performance.

EDIT: Actually, I've removed the bpsw test entirely, and just use a deterministic rabin-miller, so jacobi is no longer required (fork). Although bpsw is a bit faster, the lucas test caused overflows at the bit limits due to additions. I also suspect that rabin-miller could be made faster with better base selection.

Updated gist. As it is now, number types are valid all the way to 2^53, and int types are valid to 2^63. Getting pretty close to PR ready, I think

sogaiu commented 1 year ago

For comparison purposes, here are the times I got with the latest:

$ JANET_PATH=$(pwd)/jpm_tree/lib janet primes-2023-07-22.janet 
Elapsed time: 0.157s, 15.69µs/body
Elapsed time: 0.061s, 94.41µs/body
Elapsed time: 0.113s, 176.2µs/body
Elapsed time: 0.056s, 270.9µs/body
Elapsed time: 0.052s, 255.5µs/body

I'll leave some links here for future readers (including a future me):

primo-ppcg commented 1 year ago

Elapsed time: 0.061s, 94.41µs/body

That 94.41 is the same test as the 217.7 you reported earlier, strangely. I also found that the RM base selection was using an improper comparator. I've updated the gist, so the last 3 checks should be a lot faster now.

EDIT: Actually, just always using these bases:

[2 325 9375 28178 450775 9780504 1795265022]

is faster than selecting the bounds from small primes. I'm probably going to spend the next few days crunching the bounds on these.

sogaiu commented 1 year ago

That 94.41 is the same test as the 217.7 you reported earlier, strangely. I also found that the RM base selection was using an improper comparator. I've updated the gist, so the last 3 checks should be a lot faster now.

It's possible there was some pilot error here (^^;

I've fetched the updated gist (revision 12), freshly cloned your spork fork, checked out the modular-arithmetic branch, locally built, locally installed, and executed the gist:

$ JANET_PATH=$(pwd)/jpm_tree/lib janet primes-2023-07-22-later.janet 
Elapsed time: 0.181s, 18.09µs/body
Elapsed time: 0.073s, 114.2µs/body
Elapsed time: 0.083s, 129.6µs/body
Elapsed time: 0.042s, 203.5µs/body
Elapsed time: 0.041s, 198.1µs/body

...and it does look like the last 3 are faster.

primo-ppcg commented 1 year ago

@sogaiu I think you can close this one now :wink:

sogaiu commented 1 year ago

@primo-ppcg Thanks a lot for your work on this!