nyuichi / satysfi-base

complementary collection of useful functions and modules for SATySFi
MIT License
31 stars 11 forks source link

RegExp.exec is very slow for some inputs #49

Closed MasWag closed 3 years ago

MasWag commented 4 years ago

RegExp.exec is very slow for the following example.

@require: stdja
@require: base/regexp

let longString = ` I do not know why but it seems code2 is slow if it contains a long line.
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo` in 
%% This is slow
let slowRegex = RegExp.seq RegExp.bof RegExp.space in
%% These are not slow
let notSlowRegexList = [
    RegExp.bof;
    RegExp.space;
    RegExp.seq RegExp.bof RegExp.alpha;
    ] in
let _ = RegExp.exec slowRegex longString in
let _ = List.map (fun notSlowRegex -> RegExp.exec notSlowRegex longString) notSlowRegexList in
document (|
  title = {slow regexp};
  author = {MasWag};
  show-title = false;
  show-toc = false;
|) '<>

In my environment, a single call of RegExp.exec + (almost minimal typesetting) took about 30 seconds.

real    0m33.527s
user    0m33.334s
sys 0m0.040s
nyuichi commented 4 years ago

I started working on this.

nyuichi commented 4 years ago

The new implementation improves the performance approximately 375x for the example given above.

let input = ` I do not know why but it seems code2 is slow if it contains a long line.
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo
wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo wakegawakaranaiyo` in
let pat = `wk` in

let () = Debug.print `testing long input` in
let () = Debug.print (String.of-bool (RegExp.test (RegExp.of-string pat) input)) in
let () = Debug.print (String.of-bool (RegExp2.test (RegExp2.of-string pat) input)) in
finish
$ satysfi ./regexp2.satyg | ./ts
...
2020-04-05T21:55:50,200518000+09:00 : ---- ---- ---- ----
2020-04-05T21:55:50,206891000+09:00 : evaluating texts ...
2020-04-05T21:55:50,214867000+09:00 : testing long input
2020-04-05T21:56:25,082882000+09:00 : false
2020-04-05T21:56:25,176159000+09:00 : false
2020-04-05T21:56:25,179899000+09:00 : evaluation done.
2020-04-05T21:56:25,183183000+09:00 : ---- ---- ---- ----
...