yoriyuki / Camomile

A Unicode library for OCaml
Other
125 stars 26 forks source link

Guard all non tail recursive calls behind Stream.slazy #49

Open rgrinberg opened 6 years ago

rgrinberg commented 6 years ago

This should make the lexer work with a smaller stack.

You can test how well our "low stack mode" works with the following patch:

diff --git a/Camomile/tools/jbuild b/Camomile/tools/jbuild
index 7f9d96f..19550cf 100644
--- a/Camomile/tools/jbuild
+++ b/Camomile/tools/jbuild
@@ -16,6 +16,7 @@

 (executable
  ((name camomilelocaledef)
+  (modes (byte))
   (libraries (toolslib))
   (flags (-I Camomile :standard))
   (modules (camomilelocaledef camomilelocaledef_lexer))))

and by running it with:

OCAMLRUNPARAM='b,l=1000' jbuilder build -j1

Before this patch, this would fail for me on ar.mar. Now it fails on ja.mar with a stackoverflow in the parser now:

camomilelocaledef Camomile/locales/it_IT_PREEURO.mar
camomilelocaledef Camomile/locales/iw.mar
camomilelocaledef Camomile/locales/iw_IL.mar
camomilelocaledef Camomile/locales/ja.mar (exit 2)
(cd _build/default/Camomile && ./tools/camomilelocaledef.exe --file locales/ja.txt locales)
Warning : strength option is not supported
Fatal error: exception Stack overflow
Raised by primitive operation at file "Camomile/toolslib/colParser.mly", line 157, characters 26-34
Called from file "Camomile/toolslib/colParser.mly", line 157, characters 26-34
Called from file "Camomile/toolslib/colParser.mly", line 157, characters 26-34
Called from file "Camomile/toolslib/colParser.mly", line 157, characters 26-34
...
Called from file "Camomile/toolslib/colParser.mly", line 157, characters 26-34
Called from file "Camomile/toolslib/colParser.mly", line 157, characters 26-34
Called from file "Camomile/toolslib/colParser.mly", line 157, characters 26-34
Called from file "Camomile/toolslib/colParser.mly", line 157, characters 26-34
Called from file "Camomile/toolslib/colParser.mly", line 157, characters 26-34
Called from file "Camomile/toolslib/colParser.mly", line 137, characters 35-41
Called from file "Camomile/toolslib/colParser.mly", line 67, characters 25-42
Called from file "parsing.ml", line 142, characters 39-75
Re-raised at file "parsing.ml", line 145, characters 8-25
Called from file "parsing.ml", line 164, characters 4-28
Re-raised at file "parsing.ml", line 183, characters 14-17
Called from file "Camomile/tools/camomilelocaledef.ml", line 199, characters 17-53
Called from file "Camomile/tools/camomilelocaledef.ml", line 207, characters 20-31
Called from file "Camomile/tools/camomilelocaledef.ml", line 229, characters 22-37
Called from file "hashtbl.ml", line 266, characters 8-18
Called from file "hashtbl.ml", line 272, characters 6-21
Re-raised at file "hashtbl.ml", line 277, characters 10-13
Called from file "Camomile/tools/camomilelocaledef.ml", line 236, characters 8-15

I wonder if this is a sign that we should just give up and force camomilelocaledef to run in bytecode mode so that we don't have to worry about the stack space.

yoriyuki commented 6 years ago

Now, as you see, it fails in colParser, not colLexer.

All reported cases of stack overflow are during processing zh_PINYIN.txt. What zhPINYIN does is reorder all Chinese characters according to their pinyin reading, which should result very long sequences of expressions. So if we are serious to fix it, we must reproduce a process using zh__PINYIN in a "low stack environment" and see where it fails.

The easy fix would be, as you suggested, to use bytecode mode. But this reduces performance and unfortunately, camomilelocaledef does heavy computation. But on the modern hardware it woule be just a breeze.

First try the bytecode mode and later think about overhauling the parser and lexer?

(BTW, I will be away again until the end of this month. All I can do is to see Web and press buttons, even if I have time)

olafhering commented 4 years ago

This crash with bytecode-only is still happening today.

devel:languages:ocaml:bytecode_only/ocaml-camomile