Support for zero-copy parsing

zesterer commented 3 years ago

This is now being developed in #82

mainrs commented 2 years ago

Does this imply being able to do something like this?

fn parse() -> impl Parser<char, &str, Error = ....> {}

zesterer commented 2 years ago

Yes, among other things. There are a few complications though: feeding lifetimes through is quite an intricate process, and it's going to require a fairly fundamental rework of the Stream trait.

There are also complications with existing parsers: if just('!').repeated() results in an impl Parser<char, &str>, then what does just('!') result in? A char? But then it's owned, which implies that .repeated() is getting an unowned &str out of thin air. It might require some extra combinators to be used effectively.

One option I'm considering is a combinator that complements map_with_span that allows fetching the slice (&[T] or &str) of the input that was parsed instead of the span of the input like map_with_slice. This would allow building zero-copy parsing on top of the existing API without needing to significantly change the API and while avoiding allocation.

damien-white commented 2 years ago

@zesterer Are you still looking for any help with the zero-copy implementation? I'd love to lend a helping hand. I just finished browsing the whole branch and it looks fantastic so far. Really interesting!

zesterer commented 2 years ago

@dark-fusion Definitely! I've been on a bit of a hiatus over the past few weeks (a mixture of burnout and a lot of work), but it's beginning to look like something that's of merge quality. I think it's probably time for me to write a list of things that need doing before merge (aside from waiting for GAT stabilisation). What sort of things were you interested in working on?

damien-white commented 2 years ago

I'm most interested in adding support for binary-encoded formats, however I also understand that it would involve quite a bit of work and may be best suited as an extension crate (at least for now). This was also already discussed in #48.

While waiting for certain features to land on Rust's stable channel, I'm not sure where an extra set of hands would be most welcome.

Zij-IT commented 2 years ago

Hello hello :D

I am looking to contribute more to chumsky, and am curious on if there are any open tasks that need to be worked on related to zero-copy parsing, and if so, what those might be!

Thanks for your time!

zesterer commented 2 years ago

am curious on if there are any open tasks that need to be worked on

Hey, thanks for showing interest! There are a few things on the list, but it really depends on what you're interested in.

Looking into overhauling the error recovery system (the current system in master works okay for simple cases, but it would be nice to be able to generalise it a bit, perhaps by letting users provide their own recovery parsers instead of a handful of pre-built ones, as mentioned in #77)
Finish off implementing primitives (some aren't being ported across (filter/filter_map are now just both methods on Parser, filter and try_map, for example))
Port over (or, more likely, reimplement) the parsers from the text module, perhaps making them a bit more flexible in the process
Figure out how to make something akin to the custom parser in master, but actually useful by strategically making methods like InputRef::next public so that people can write their own inline parsers that manually pull tokens out of the input
Add Parser implementations for a bunch of types to improve the utility of things (&impl Parser, Box<impl Parser>, Rc<impl Parser>, etc.)
Port over 'recent' changes from master (.not(), etc.)
If you're really looking for a challenge, something I'm grappling with at the moment is figuring out how to make error generation in zero-copy very fast. master works fine, but each parser returns a rather complex type to represent the errors. It would be better to return just a Result<O, ()> and have the errors stored inline within InputRef (I've already started working on this, but it needs work), then figuring out how best to choose between error sets with combinators like Or and Repeated

Any one or part of these would be really appreciated and would definitely help speed up the process of getting the branch ready for merge. I'd love to dedicated more time to it myself, but I'm going through quite a busy period in my life right now and have a bit less time to spare than I normally do :(

Zij-IT commented 2 years ago

Thanks for the list :D I will take a crack at text and get a pull request going when I can. And no worries about being busy! If someone blames you for that they can write it themselves ;)

Zij-IT commented 2 years ago

Specs

Modified version of zero-copy with a pratt-parser, which can be found here.
rustc 1.65.0-nightly (addacb587 2022-08-24)
Cpu: AMD Ryzen 5 4500U with Radeon Graphics (6) @ 2.375GHz
Memory: 16GB

Issue

I am not sure what the exact issue is, but I don't know if zero-copy is going to have a great time with compile times. For reference, I am compiling my Lua parser, and you know you are in for a great time when cargo check takes almost 10 minutes. When I first ran cargo run --release I killed the process after about 60 minutes.

I am up for any and all recommendations you have here, as I am kinda at a loss. I know you are busy, so there's no rush! Thanks for your time in advance.

zesterer commented 2 years ago

@Zij-IT The very long or chain here is a bit suspect to me. Due to deficiencies in rustc's ability to memoise trait obligation, Or parsers tend to have exponential compilation times. I'd recommend trying choice instead, it tends to not have this issue.

That aside, make sure to throw .boxed() on the end of any large parsers. It has a relatively insubstantial runtime cost, but improves compilation times very significantly!

Zij-IT commented 2 years ago

I mentioned #197 that I was able to get it to compile, but I figured I would show you some of the fruits of your labor :D

Specs

rustc 1.65.0-nightly (addacb587 2022-08-24)
Cpu: AMD Ryzen 5 4500U with Radeon Graphics (6) @ 2.375GHz
Memory: 16GB

Task

Parsing every .lua file in the luvit Lua library
Total lines: 21972
Total characters: 608323

Results

master Parser here - 212_198ms
zero-copy Parser here - 149_218ms

I would say that is quite the achievement! I am curious how where there is still room for improvement and excited to see that number working its way down!

zesterer commented 2 years ago

That's quite a nice improvement! Not as much as I'd hoped though, unfortunately. I think there's still more to be done here. Are these just the numbers for the lexer alone, or the parser too?

I've realised that the foldl/foldr combinators are extremely commonly hit in most parsers, so I'm interested to see whether we could make them ingest a Repeated/SeparatedBy directly instead of requiring a .collect() into a vector beforehand, saving a lot of allocation and memory juggling.

Zij-IT commented 2 years ago

Its for both the lexer and the parser! Each of the 200ish files goes from &str to Chunk!

Zij-IT commented 2 years ago

Its me again, and with some fantastic news for me, and uncertain news for zero-copy :D I was going back through my Parser looking for those optimizations you were hoping for and I found THE problem in my implementation. I was looking at a flamegraph and I noticed that the BinaryInfixOperator followed by Foldr was taking up a large portion of the time. I thought this was odd because I only use foldr in two locations, so I went looking at them, and it hit me. This little portion of my expression parser has one CRITICAL issue:

let power = atom
    .clone()
    .then(just(TokenKind::OpPow).to(BinaryOp::Power))
    .repeated()
    .then(atom.labelled("binary operand"))
    .foldr(|(lhs, op), rhs| Expr::Binary {
        lhs: Box::new(lhs),
        rhs: Box::new(rhs),
        op,
    });

It parses EVERY atom twice if that atom is not followed by a TokenKind::OpPow. This is wonderfully terrifying, when the singular expression is, say, a giant function definition like this one:

test('fs chmod', function(expect)
  local mode_async
  local mode_sync
  -- On Windows chmod is only able to manipulate read-only bit
  -- TODO: test on windows
  if is_windows then
    mode_async = 256 --[[tonumber('0400', 8)]] -- read-only
    mode_sync = 438  --[[tonumber('0600', 8)]] -- read-write
  else
    mode_async = 511 --[[tonumber('0777', 8)]]
    mode_sync = 420 --[[tonumber('0644', 8)]]
  end

  local file1 = Path.join(__dirname, 'fixtures', 'a.lua')
  local file2 = Path.join(__dirname, 'fixtures', 'a1.lua')

  local function maskMode(mode, mask)
    return bit.band(mode, mask or 511 --[[tonumber('0777',8)]])
  end

  fs.chmod(file1, mode_async, expect(function(err)
    assert(not err)
    if is_windows then
      assert(maskMode(maskMode(fs.statSync(file1).mode), mode_async))
    else
      assert(mode_async == maskMode(fs.statSync(file1).mode))
    end

    -- TODO: accept mode in number
    assert(fs.chmodSync(file1, mode_sync))
    if is_windows then
      assert(maskMode(maskMode(fs.statSync(file1).mode), mode_sync))
    else
      assert(mode_sync == maskMode(fs.statSync(file1).mode))
    end
  end))
  fs.open(file2, 'a', tonumber('0666', 8), expect(function(err, fd)
    assert(not err)
    fs.fchmod(fd, mode_async, expect(function(err)
      assert(not err)
      if is_windows then
        assert(maskMode(maskMode(fs.fstatSync(fd).mode), mode_async))
      else
        assert(mode_async == maskMode(fs.fstatSync(fd).mode))
      end

      -- TODO: accept mode in number
      assert(fs.fchmodSync(fd, mode_sync))
      if is_windows then
        assert(maskMode(maskMode(fs.fstatSync(fd).mode), mode_sync))
      else
        assert(mode_sync == maskMode(fs.fstatSync(fd).mode))
      end
      fs.close(fd)
    end))
  end))

  -- lchmod
  if fs.lchmod then
    local link = Path.join(__dirname, 'tmp', 'symbolic-link')
    fs.unlinkSync(link)
    fs.symlinkSync(file2, link)

    fs.lchmod(link, mode_async, expect(function(err)
      assert(not err)
      p(fs.lstatSync(link).mode)
      assert(mode_async == maskMode(fs.lstatSync(link).mode))

      -- TODO: accept mode in number
      fs.lchmodSync(link, string.format('%o', mode_sync))
      assert(mode_sync == maskMode(fs.lstatSync(link).mode))
    end))
  end
end)

There are of course as you know many ways to solve this problem; I just went ahead and used the binary_infix_operator from #144 (though I had to fix the Clone implementation). This lead to the following improved times:

master Parser - 1600ms (750ms lexing)
zero-copy Parser - 1600ms (750ms lexing)

zesterer commented 2 years ago

Its for both the lexer and the parser! Each of the 200ish files goes from &str to Chunk!

Out of interest, how do just the lexing times compare? My suspicion is that lexing probably got a lot faster, but parsing didn't (because parsing already requires a lot of collecting into vectors, recursive data structures, etc. so there probably wasn't much that zero-copy could improve on).

I was going back through my Parser looking for those optimizations you were hoping for and I found THE problem in my implementation.

Ah yes, this is quite a common pitfall. Chumsky doesn't have memoisation, so it'll eagerly attempt to parse both interpretations twice, which can rapidly become exponential!

I'm quite surprised that master and zero-copy now have similar timings though. I would have hoped that at least some of its improvements were visible...

Zij-IT commented 2 years ago

I will give it a look when I can tomorrow, but I have only moved over the parser as of currently, so both lexers are using the non-zero-copy parser combinators, and are identical.

Sorry for the confusion 😅 My branch naming scheme when working on my own stuff needs work.

Zij-IT commented 2 years ago

Welp, I am back :D I had to fix some issues that I found in the zero-copy branch, but after I got them done I was able to do some testing, and here are the results:

Lex and Parse

master Parser - 1600ms
zero-copy Parser - 878ms

Lex

master Lexer - 750ms
zero-copy Lexer - 150ms

mainrs commented 2 years ago

It parses EVERY atom twice if that atom is not followed by a TokenKind::OpPow

@Zij-IT Could you further elaborate? I do not really understand why this is the case.

Zij-IT commented 2 years ago

@Zij-IT Could you further elaborate? I do not really understand why this is the case.

Here is the parser again to prevent you having to scroll up :D

let power = atom
    .clone()
    .then(just(TokenKind::OpPow).to(BinaryOp::Power))
    .repeated()
    .then(atom.labelled("binary operand"))
    .foldr(|(lhs, op), rhs| Expr::Binary {
        lhs: Box::new(lhs),
        rhs: Box::new(rhs),
        op,
    });

To move this onto more pseudocode and hopefully more understandable, here it is as EBNF

power := (atom Pow)* atom

Now, the important thing here is that the left and right hand side of the TokenKind::Pow are the same item. I am going to go through parsing this with the following example:

12 ^ 12 ^ 12

To accurate parse the above expression we have to repeat the (atom Pow) parser until it fails, and then we parse the final atom. Here are the steps:

Parse (atom Pow) results in 12 ^ | 12 ^ 12
Parse (atom Pow) results in 12 ^ 12 ^ | 12
Parse (atom Pow) results in an error, because after parsing the last 12, there is no Pow.
Parse atom results in 12 ^ 12 ^ 12 |

The problem is that in both steps 3 and 4 we have to parse 12, because there is no way to pass the successfully parsed 12 to the next atom.

For this tiny example, that is no issue. But in the Lua example I have above, some of the lines are nested in three lamda definitions, and because each expression will be parsed twice, every item within the third call will be parsed at a minimum 2^3 times.

Zij-IT commented 2 years ago

Heyyo :) Me again! Life's been hectic for the last couple months, but I am excited to start helping out again.

Any issues you are currently hoping to get marked of the todo list for zero-copy that I can help with?

zesterer commented 2 years ago

Hey, hello again! I think a big one would be porting docs / text parsers over, and perhaps some of the newer parsers like not and and_is (suggested in #208).

I'm also considering something big (but fundamentally uncomplicated): switching out the Output associated type with an O type parameter, as with master. The primary reason is to allow implementing Parser for !, but in hindsight it just makes more sense and consistency with master is good. That's quite a big & boring job though (lots of copy & paste, renaming, etc.), so I can completely understand if you're not interested in it.

bew commented 2 years ago

lots of [boring] copy & paste, renaming, etc.

If you can describe the tasks (or make a small example to replicate), I might try to play with it, trying to automate most of it with vim if it can help you 🤔

zesterer commented 2 years ago

It's a relatively simple process. Everywhere you see Parser<I, Output = O>, you just turn it into Parser<I, O> obviously this gets slightly more complicated than just being a copy & paste process in some of the trait bounds, but there shouldn't be anything particularly wild.

zesterer / chumsky