gvergnaud / hotscript

A library of composable functions for the type-level! Transform your TypeScript types in any way you want using functions you already know.
3.51k stars 59 forks source link

feat(strings): faster string length and repeat #19

Closed sno2 closed 1 year ago

sno2 commented 1 year ago

This optimization makes use of a self-memoizing cache in the string length algorithm which allows it to run on a logarithmic time scale. This allows us to get the length of strings that are over 800,000 characters long performantly without freezing the language server.

ecyrbe commented 1 year ago

Not sure of the improvement here, current implementation handles 999 characters size just well and maybe handling 9999 is too much for practical use cases :

    type chars999 = Call<Strings.Repeat<999>, 'a'>;
    type length999 = Call<Strings.Length, res100>;

Can you measure timings for strings of these sizes :

to be sure this is not overkill for lower sized strings

sno2 commented 1 year ago

Not sure of the improvement here, current implementation handles 999 characters size just well and maybe handling 9999 is too much for practical use cases :

    type chars999 = Call<Strings.Repeat<999>, 'a'>;
    type length999 = Call<Strings.Length, res100>;

Can you measure timings for strings of these sizes :

  • 5
  • 10
  • 100
  • 250
  • 999
  • 9999 (will only work with you impl, not current one)

to be sure this is not overkill for lower sized strings

When considering the type system, the way that I see the order that we should optimize should be that runtime complexity trumps all other aspects. We only have a few thousand iterations before the type system fails us. Therefore, using a logarithmic implementation will make it possible to do much more trickery with string types.

I did try benchmarking this with Hello, world!. However, it seems to be negligible performance differences because hyperfine found that the after implementation was faster after running for 1.5 minutes on the same programs.

(Before) Range (min … max): 9.647 s … 15.403 s 10 runs

(After) Range (min … max): 9.200 s … 11.496 s 10 runs

The performance is definitely better for strings larger than 16 because it takes log2(x) plus a few more for all iterations. Here are the benchmarks for 1,000 characters. Although, I'm not that confident in these benchmarks as I feel like using hyperfine on running tsc is probably going to have a high +- rate and depends on the computer.

(Before) Time (mean ± σ): 11.915 s ± 2.519 s [User: 22.778 s, System: 2.058 s] Range (min … max): 9.053 s … 15.845 s 10 runs

(After) Time (mean ± σ): 8.297 s ± 0.774 s [User: 16.844 s, System: 1.418 s] Range (min … max): 6.764 s … 9.350 s 10 runs

I also improved the algorithm even more such that it can take above 800,000 characters by transitioning into a linear algorithm on the 2^13 pattern after it gets that large.

sno2 commented 1 year ago

@gvergnaud Would it be alright if I apply this optimization to other string methods in this PR as well? If I applied the logarithmic cache idea to Repeat, then we could repeat ~400,000 characters and then get the length in one go.

sno2 commented 1 year ago

Actually, Repeat would probably be the only other method I would implement it for. We could do it for string trimming, but not really sure if it's worth it.

ecyrbe commented 1 year ago

LGTM, thank you for the tests.

Yes, go ahead and you can do Repeat also

If you have other perfs improvements in other areas, they are welcomed.

Numbers are already pretty optimised since they are using a lot of lookup tables.

If you think you can optimise string comparison, go ahead also

sno2 commented 1 year ago

Alright, this is ready for review. I am very pleased with the result of the Repeat methods because there is not a cap on the size of a string literal so we can scale the patterns to 2^n indefinitely and create strings millions of characters long with ease :^)

Here is a little round-trip example of using the two features together:

import { Strings, Pipe } from ".";

type A = Pipe<"a", [Strings.Repeat<840_000>, Strings.Length]>;
//   ^? 840000
ecyrbe commented 1 year ago

@sno2 here is an improved, more readable version i implemented that does the same (supports more than 1_000_00 repeats like yours)

export type Repeat<
  T extends string,
  N extends number,
  Acc extends string = "",
  Calc extends {
    Quotient: number;
    Remainder: number;
  } = DivMod<N, 2>
> = N extends 0
  ? Acc
  : N extends 1
  ? `${Acc}${T}`
  : Calc["Remainder"] extends 0
  ? Repeat<RepeatX2<T>, Calc["Quotient"], Acc>
  : Repeat<T, Sub<N, 1>, `${Acc}${T}`>;

it reads almost the same as the js version:


const repeat = (s: string, count: number, acc: string = ''): string => {
  if (count === 0) {
    return acc;
  } else if (count % 2 === 0) {
    return repeat(s + s, count / 2, acc);
  } else {
    return repeat(s, count - 1, acc + s);
  }
}

Do you want to also simplify the Length one ? else i'll do it, because it's really hard to read and maintain