maciejhirsz / logos

Create ridiculously fast Lexers
https://logos.maciej.codes
Apache License 2.0
2.71k stars 105 forks source link

chore(lib): remove error branch from LUT if it is unreachable #386

Closed RustyYato closed 1 month ago

RustyYato commented 2 months ago

Fixes #385

This removes unreachable branches from LUTs in the generated code, so that LLVM can see that the error case is unreachable for lexers like in the linked issue

jeertmans commented 2 months ago

Hello @RustyYato! Thank you for your PR!

Can you implement a small test that generates the LUT and checks that, indeed, the error branch is removed if it is unreachable?

Tell me if you need help with setting up the test :-)

RustyYato commented 2 months ago

Would you like a test like in logos-cli? I think I could extend that to check the codegen of the LUT, or did you have something else in mind?

RustyYato commented 2 months ago

friendly ping @jeertmans :smile:

jeertmans commented 2 months ago

Ooos sorry @RustyYato!

yes the easiest way to test this might be to test against the output of the logos-cli. So you can generate a very basic scenario: one lexer that has a useful error branch, and one whose error branch was removed :)

To check that your tests work, you can remove your patch and first check that they fail, then apply your patch and check they now work :)

RustyYato commented 2 months ago

Hey, I will be out for the next few days, but I'll get back to this once I'm back. 😀

RustyYato commented 2 months ago

Hmm, it looks like the lookup table isn't taken in some cases (specifically the one in the related issue). So there may be some other low hanging fruit to remove error cases. But I'll add a test for one that does use the LUT and solve the other cases separately.

RustyYato commented 2 months ago

I've just updated the tests in logos-cli to handle multiple inputs at once, so it becomes easier to add new codegen tests in the future. LMK if you had something else in mind/

jeertmans commented 1 month ago

Looks great, thanks @RustyYato!

I have 2 remarks / suggestions:

Don't hesitate to give your opinion on this :-)

RustyYato commented 1 month ago

Ok, in that case I think it would be nice to make it easy to add new tests and update old tests if codegen changes. I reverted my original tests commit, and added a new test function which is parameterized on the codegen directory name, and added an env var BLESS_CODEGEN to update the codegen tests if they need to be updated all at once. How does that look?

I can rebase the changes right before merging to clean up the history if you would like. I find rebases during reviews make reviewing harder so didn't do that yet.

RustyYato commented 1 month ago

friendly ping @jeertmans :slightly_smiling_face:

jeertmans commented 1 month ago

Hey @RustyYato, I am very sorry about my delay, was very busy working on my own projects and work, so I put those reviews on hold.

Anyway, I think your work is great and deserves to be merged! Thanks for your great work on this!

At first, I was waiting because I wanted to think a bit about the best (and cleanest) way to handle those kind tests, but let's merge this and see if we need to improve it (or not) in the future!

RustyYato commented 1 month ago

Not a problem, thanks for the reviews!