Open nomeata opened 1 week ago
Base conversion is algotihmically quadratic. (Every digit requires doing what amounts to (n % 10, n / 10)
and there are not many shortcuts.) I don't think the List Char thing (which is linear) is what you are witnessing. As long as you only do base conversion up to bounded length (by breaking it into limbs of some size), this should be much faster.
Ok, but presumably there are still some large constant factors on the table here?
Yes, possibly, I'm just saying that these numbers don't demonstrate that. If large number literals are transitioned to use a from_array
function instead of translating a string, the calls to Nat.repr
would disappear except for small numbers, where I think they are not a bottleneck.
When experimenting with encoding set as bitvectors as
Nat
s, which seems like it should be rather efficient, I noticed that processing the file in VSCode was quick enough, butlake build
would take a long time, and lean is busy writing the.c
file.I suspect that
Nat.repr
is just very slow on larger literals, given that it goes throughList Char
, rather than allocating aString
of the right size and then (linearly) updating it digit by digit (or chopping theNat
intoUSize
-sized limbs, using the C code to print it, and concatenating efficiently).It also seems to scale quadratically with the length of the number, as
shows, which gives me (on live.lean-lang.org) timings
Versions
"4.12.0-nightly-2024-10-18"
Impact
Add :+1: to issues you consider important. If others are impacted by this issue, please ask them to add :+1: to it.