yoriyuki / Camomile

A Unicode library for OCaml
Other
125 stars 26 forks source link

Reduce stack usage of ocamllocaldef #48

Open yoriyuki opened 6 years ago

yoriyuki commented 6 years ago

Probably AbsOrd's fault

rgrinberg commented 6 years ago

Is this in relation with #39? I actually wonder if some of these stack overflows are from my very liberal port of the camlp4 stream parsers. I know the original version was more tail recursive than my port, but I didn't think much of it because it worked well enough on my machine. I can revisit the port and add back more tail recursion.

I suppose I could also test this in bytecode mode where I can configure the stack size.

rgrinberg commented 6 years ago

Another option here is to simply make this preprocessing binary always be built in bytecode mode. Bytecode should always have a large stack on every platform so it should fix the problem. It's a bit hacky though.

rgrinberg commented 6 years ago

The final option which is the best and most time consuming one would be to rewrite the lexer using ocamllex. The Stream module in OCaml is essentially deprecated anyway. I'd do it myself, but I fear that my understanding of the format being parsed is incomplete.

yoriyuki commented 6 years ago

As you see in your experiment, stack overflow is not only caused by lexer but also parser. The parser is actually a main suspect, because it uses nested function calls during processing expressions.

But I wonder why stack overflow is suddenly reported recently. It worked fine several years.

rgrinberg commented 6 years ago

Yoriyuki Yamagata notifications@github.com writes:

As you see in your experiment, stack overflow is not only caused by lexer but also parser. The parser is actually a main suspect, because it uses nested function calls during processing expressions. But I wonder why stack overflow is suddenly reported recently. It worked fine several years.

Yes, that is indeed puzzling. This could be somewhat justified by the fact that people rarely test on these exotic platforms. A new release triggered a bunch of testing downstream.

Of course the above is very optimistic, a more likely cause is that my port of the lexer might have increased the stuck usage just enough to go over the budget on some platforms. And that's even the changes in my PR restoring the tail recursion.

yoriyuki commented 6 years ago

Looking #49 again, the error during processing ja.txt occurs after your lexer and inside of localdef function.

Of course this may not be related to reported failures, because they all occur during processing zh_PINYIN. We need to try this file first.