gwenn / lemon-rs

LALR(1) parser generator for Rust based on Lemon + SQL parser
The Unlicense
48 stars 10 forks source link

Parser: box parser #19

Closed MarinPostma closed 1 year ago

MarinPostma commented 1 year ago

This PR boxes the yyParser in Parser. The reason for that is that it is huge (141464 bytes!), and I encountered stack overflows because of this.

gwenn commented 1 year ago

I guess that the real culprit is the AST (#8)

MarinPostma commented 1 year ago

can you point me toward where that could be fixed?

gwenn commented 1 year ago

Just run cargo clippy with enum-variant-size-threshold = 50. There is also this stack: https://github.com/gwenn/lemon-rs/blob/master/third_party/lemon/lempar.rs#L189 You can check its max size: https://github.com/gwenn/lemon-rs/blob/master/third_party/lemon/lempar.rs#L184-L185 Or try to reduce its size: https://github.com/gwenn/lemon-rs/blob/master/third_party/lemon/lemon.c#L4431-L4435

MarinPostma commented 1 year ago

I have investigated this a bit further. Here is the list of the 15 biggest types, along with their size:

141464  parser::parse::yyParser - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:2257:1: 2265:2 (#0)
  6480  [&'static str; _] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:2630:1: 3036:3 (#0)
  4688  [&'static str; 293] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:2328:1: 2623:3 (#0)
  4550  [u16; 2275] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:1635:1: 1864:3 (#0)
  4216  [u16; 2108] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:1420:1: 1632:3 (#0)
  1144  parser::Context - src/parser/mod.rs:38:1: 44:2 (#0)
  1114  [u16; 557] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:1981:1: 2038:3 (#0)
  1114  [u16; 557] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:1872:1: 1929:3 (#0)
  1096  parser::parse::yyStackEntry - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:2244:1: 2250:2 (#0)
  1096  parser::ast::Cmd - src/parser/ast/mod.rs:117:1: 121:2 (#0)
  1088  parser::parse::YYMINORTYPE - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:100:1: 180:2 (#0)
  1088  parser::ast::Stmt - src/parser/ast/mod.rs:157:1: 273:2 (#0)
  1008  parser::ast::TriggerCmd - src/parser/ast/mod.rs:2648:1: 2669:2 (#0)
   928  parser::ast::InsertBody - src/parser/ast/mod.rs:2537:1: 2540:2 (#0)
   810  [u16; _] - /Users/mpostma/Documents/code/rust/lemon-rs/target/debug/build/sqlite3-parser-47a895829ff66033/out/parse.rs:3382:1: 3788:3 (#0)

The main reason yyParser is so large, is because of the SmallVec of yyStackEntry, each of which is 1088 bytes. Almost all of the size in the ssStackEntry comes from the YYMINORTYPE (1088 bytes) (that's the AST enum). I don't think that 1088 bytes for enum are problematic. There could be performance issues caused by its size of it, but then it would require actual benchmarking to see where to draw the line.

That basically leaves us with two options:

The former is easier since it doesn't involve editing the grammar file.

gwenn commented 1 year ago

Thanks for your investigation. In the original C version, you can choose between dynamic vs static stack: https://github.com/sqlite/sqlite/blob/4e86aa86ea0ab657423f1a715c620c188c6ba879/tool/lempar.c#L215-L222 And SQLite uses a static stack but doesn't generate an AST...

We should try to rollback https://github.com/gwenn/lemon-rs/commit/8e9925e0573fe29a7998b3973c1bd69e78d42dfb

MarinPostma commented 1 year ago

@gwenn, I have changed my mind and decided to go with making the AST itself leaner, I'll keep you updated

MarinPostma commented 1 year ago

I'm closing this pr for now, I'll get back to it when I have more time to work on optimizing the AST itself

gwenn commented 1 month ago

@MarinPostma For your information, SmallVec has been removed. See #57