mthom / scryer-prolog

A modern Prolog implementation written mostly in Rust.
BSD 3-Clause "New" or "Revised" License
2.08k stars 125 forks source link

Faster integer formatting to arbitrary base #2360

Open triska opened 8 months ago

triska commented 8 months ago

In one of my cryptographic applications of Scryer Prolog, the bottleneck turns out to be formatting large integers in base 16, using library(format).

We can use the ~Nr format control sequence to format an integer to base (radix) N, such as for example:

?- phrase(format_("~16r", [121212]), Cs).
   Cs = "1d97c".

With larger integers, this is currently somewhat slow. For example, with the following definitions:

:- use_module(library(format)).
:- use_module(library(time)).
:- use_module(library(between)).
:- use_module(library(dcgs)).

exp(E) :-
        N is 2^E,
        between(1, N, _).

We have:

?- I is 2^2^8,
   time((exp(12),phrase(format_("~16r", [I]), _),false)).
   % CPU time: 1.766s, 8_413_227 inferences
   false.

The formatting code is written in Prolog and should preferably remain in Prolog, because if every tiny detail is hardcoded in Rust itself, then there will never be enough incentive to make actual Prolog code faster:

https://github.com/mthom/scryer-prolog/blob/7b33eafbb052b8543942965f2967d64e53d918c3/src/lib/format.pl#L372

The question I have is therefore: What are the most fundamental, smallest building blocks that would help to make this Prolog code faster? Only that functionality should be implemented in Rust, if at all.

For example, currently, both I0 mod R and I0 // R are computed in the code:

https://github.com/mthom/scryer-prolog/blob/7b33eafbb052b8543942965f2967d64e53d918c3/src/lib/format.pl#L392

I see that the dashu crate exposes functionality to compute both results in a single operation:

https://docs.rs/dashu-int/latest/dashu_int/struct.IBig.html#impl-DivRem-for-IBig

Would such a predicate be a sensible addition to library(arithmetic), to obtain both the quotient and the remainder of two integers with better efficiency than is currently possible?

Any other ideas? I would greatly appreciate all suggestions and help with this issue. Thank you a lot!

mthom commented 8 months ago

Rust uses the ryu crate to print floating point numbers to strings quickly but I'm not sure if there is an equivalent for numbers. perhaps @cmpute could advise?

cmpute commented 8 months ago

There are several ways to support fast printing in dashu:

  1. Fast formatting with base 2, 8, 10 and 16 is already supported through the conventional Rust protocal. (Display, Debug, etc)
  2. Fast formatting with arbitrary base (up to 32) is already supported through dashu_int::IBig::in_radix method.
  3. Fast conversion to arbitrary base digits is in dev plan (the API currently in my mind is like dashu_int::IBig::to_digits(base: Word) -> Vec<Word>)
  4. Fast division with constant divisor is already supported through the dashu_int::fast_div::ConstDivisor

Not sure which approach is best suitable for Scryer Prolog. Let me know as well if you want different APIs :)

triska commented 8 months ago

Thank you a lot @cmpute! I have published a draft (#2362) that makes the in_radix method available as an internal predicate. library(format) supports bases up to 36, so using this method would require adjustments in library(format): It would mean that the fast code can be used for most of the cases, but not for all of the cases that the library covers.

It may be interesting to have this functionality, for very large integers, if the need arises. I will first try to improve format_//2 in other ways that ideally only require changes in the engine that are as generally useful as possible.

cmpute commented 8 months ago

Oh my bad, there was a typo, in_radix actually supports radix up to 36, which is in line with your need. Besides, whether to use upper case for the output is supported through the alternative flag (as documented here)

triska commented 8 months ago

@cmpute: Thank you a lot for the clarification!

In my tests, the test case above can be made to run about 4 times faster by using the native in_radix method instead of the Prolog-based implementation. Personally, I think this speedup does not justify making this particular API available in Scryer, since it further increases the size of the Rust-based engine and also adds additional complexity to library(format) which should ideally remain portable between Prolog systems, and thus also needs to handle the situation that such a native method is not available in other Prolog systems.

I wonder whether adding a more general predicate to Scryer that also covers the functionality of the in_radix method is worth adding instead, for example, a predicate such as format_integer(FormatString, Integer). However, it is unclear how the specific functionality of the in_radix method could be best plugged into such a construct, and also how much "Rust-specifics" (such as the conventions of Rust format strings) should be exposed by the interface which should ideally be independent of the underlying implementation language.

The "arbitrary base digits" API mentioned above seems interesting, and may indeed be worth exposing as an interface predicate when it becomes available, especially if it satisfies these desiderata.

cmpute commented 8 months ago

I will link this issue to the commit when the digits conversion functionality is added. However, it worth noting that this API won't get as much optimization as the small base conversions (at least not in the near future), i.e. the speed improvement can be smaller.