hylo-lang / Lotsawa

A Swift implementation of the MARPA algorithms
Apache License 2.0
19 stars 2 forks source link

License Question #16

Closed oscar-benderstone closed 1 year ago

oscar-benderstone commented 1 year ago

Hello,

I am interested in using Lotsawa for parsing a custom metalanguage.

May I ask what license this project has? I know the val-compiler is Apache-2.0, but I didn't know if a similar license applies for Lotsawa. I am following the advice in this post.

dabrahams commented 1 year ago

Hi @oscar-benderstone !

As soon as @sean-parent gets back to me about it, I'm happy to apply a license. Is Apache-2.0 acceptable for your purposes? Note, for completeness, that you would be the first client of this code; Val is not currently using it and that may never happen. That said, I'd be very pleased to use your use case as a driver for completing any unfinished parts.

You could consider using Swift-Marpa instead, since it is built on the more-mature libmarpa codebase. But, the API is not very Swift-idiomatic and the support story is more complicated because you'd be relying on both me and @jeffreykegler.

oscar-benderstone commented 1 year ago

Yes, Apache-2.0 works well.

I am experimenting with Lotsawa versus writing my own parser. I like the Marpa algorithm overall, but I prefer swift to perl for a frontend. I suspect it may be easier to use the swift API (particularly with interop with rust), and there may be better error handling. I mainly need a parser to manipulate an AST; I am making a language to keep track of my notes (and complicated ideas). Would you recommend using Lotsawa for my custom language? I can open another issue if that would be better.

Also, if you have more documentation you could point me to, that would be great. I am not sure where I should start looking.

jeffreykegler commented 1 year ago

GNU's comments on the various licenses come from a strong POV, but are extremely well-informed. It's worth knowing what they think, even if you don't follow their advice, which the Marpa community does not.

The Marpa community switched to the permissive "MIT license", GNU's advice to the contrary notwithstanding. The support for this in the community was IIRC 100%, with much of it very enthusiastic. My own opinion was heavily influenced by the experience of the Lua community.

My experience has been that even mildly restrictive licenses put real obstacles in the way of the good guys, while bad guys find ways around them. Save a friend, even if that means there's an enemy you don't shoot because of that.

In the list, the license we choose is called the "Expat license". GNU does not even approve of what we call our license! Their point about the ambiguity of the name is on the mark, but the name "MIT license" for our license is almost universal.

dabrahams commented 1 year ago

@jeffreykegler My default would be the Boost Software License, which is slightly less restrictive than MIT and Apache. Other reasons are outlined in this rationale The only argument I can think of against it is that it is less-commonly used than other licenses, and each new license a lawyer has to analyze raises the barrier to adoption.

dabrahams commented 1 year ago

@oscar-benderstone

I suspect it may be easier to use the swift API (particularly with interop with rust),

Swift interoperates with rust? How?

and there may be better error handling.

Neither MARPA nor Lotsawa has much specific support for error handling AFAIK, though both give you tools you can use to inspect the state and recover when things go wrong. MARPA's tools are arguably more packaged for an end-user; Lotsawa is built as a generic library where the low-level parts are a bit more exposed. This is an area of Lotsawa I'm interested in developing, though, so if you get stuck with error handling I'd be interested in trying to design APIs to make it easier.

I mainly need a parser to manipulate an AST

Hmm, I'm not sure if either technology will help you, then. Parsers can be used to build ASTs, but they tend not to have a role in subsequent manipulation.

I am making a language to keep track of my notes (and complicated ideas). Would you recommend using Lotsawa for my custom language?

I can't responsibly recommend it yet, just because it's immature by comparison with other technologies. You'd be the first client putting it into production. That said, if you're willing to risk it, I'll try to be responsive to support requests.

And I won't claim to know that MARPA is the best alternative to Lotsawa. There are plenty of other good technologies out there. For example, you could use Lark's Earley parser, which, as long as you avoid right-recursive rules, should be roughly equivalent to MARPA's algorithm. IIRC, Bison has a nondeterministic parser mode.

Also, if you have more documentation you could point me to, that would be great. I am not sure where I should start looking.

The documentation is pretty much all in the doc comments in the code. I should look for a tool that extracts and publishes it as HTML as part of CI. I believe it to be quite rigorous and complete, but it is not verbose—you won't find examples in there. The tests do have some examples you can work with, but you might need a little guidance.

HTH, Dave

oscar-benderstone commented 1 year ago

Thank you for the response. There is a crate called swift-bridge that allows compiled swift code to be integrated into a rust binary. There are a few build steps required to make this work, but it can be done.

I may revisit using Lotsawa in the future. Currently, I would like to complete an early version of my project and go from there. My language does not have right recursion, so there are other tools I can use at the moment. I should also work on analyzing AST's on my own.

Thank you again for your support.

jeffreykegler commented 1 year ago

@dabrahams @oscar-benderstone

I spent a lot of years programming in industry, so I know the language of implementation can be a show-stopper, no matter how good the alternative. But in case you can stomach Perl:

Marpa::R2's error-handling is AFAIK without match. On parse error, it tells you the exact place things went wrong, in an exact sense: the last place where there still is a potential parse. It also tells you what it was looking for. Less powerful algorithms tell you where the algorithm broke down, which is often at some point after the problematic location, and of course at that point they no longer have the information to reconstruct what input would have been needed to keep things going.

One unmatched error reporting feature: Marpa::R2 handles ambiguous parses but most apps don't want them so the default is to treat them as errors. In that case, Marpa::R2 tells you exactly where the ambiguity is, and what the two alternatives were.

To do this, Marpa::R2 uses an interface that lets it traverse the parse as an ASF (abstract syntax forest). That also is available to the user, and can be used to manipulate an AST (which is an ASF with no ambiguity.)

Wrt Bison, it's been known since the 70's that every parsing algorithm can be made into a general parser by backtracking, and since then packratting has been tried to mitigate CPU costs by memoizing. I'd look into any alternative that took hold, but what I see is the same hopes being expressed every few years for a different implementation, one that seems to have been made in less than full awareness of the track record of the techniques being deployed. Until one gets some real uptake my priorities have to be elsewhere.

dabrahams commented 1 year ago

On parse error, it tells you the exact place things went wrong, in an exact sense: the last place where there still is a potential parse. It also tells you what it was looking for.

Because it's basically the same algorithm, Lotsawa does that too. Those are the tools you need to programmatically diagnose an error, but that's not what I consider error handling, which would involve specific support for:

Also, I didn't mean to suggest that Bison is an inherently superior parsing technology, or even as good. It's just that many more factors probably go into making the best choice for any given project, including what programming languages you are already using. Without a lot more information about @oscar-benderstone's use case, I'm not in a position to recommend anything.

dabrahams commented 1 year ago

@oscar-benderstone wrote:

There is a crate called [swift-bridge] (https://github.com/chinedufn/swift-bridge) that allows compiled swift code to be integrated into a rust binary.

Oh, nifty! I didn't know.

I may revisit using Lotsawa in the future. Currently, I would like to complete an early version of my project and go from there. My language does not have right recursion, so there are other tools I can use at the moment.

What options are you considering, if I may ask?

I should also work on analyzing AST's on my own.

I'm still not sure what you mean by this. Any processing such as analysis that occurs after an AST is constructed is not the sort of thing a parser can help with.

Thank you again for your support.

My pleasure.

oscar-benderstone commented 1 year ago

What options are you considering, if I may ask?

Based on what @jeffreykegler said, marpa in perl sounds like a good option. I do have three questions about that:

If there is helpful documentation that can answer any of these questions, I can definitely look at that.

Currently, I am creating a prototype with chumsky, but if I can get marpa to work with my needs, I would use that instead. (In the future, I would definitely consider libmarpac or Lotsawa for performance, but performance is not my main priority).

I'm still not sure what you mean by this. Any processing such as analysis that occurs after an AST is constructed is not the sort of thing a parser can help with.

I mean to say I probably need a separate program to analyze the AST, and due to my needs, I may complete that on my own. There are some things I would like to do, such as:

I would be happy to put the EBNF for my language to better explain my needs. I also need to come up with examples to show some of its use cases.

@dabrahams @jeffreykegler I appreciate both of your help. Let me know if I should open a new issue or post a question somewhere else. I hope I am not getting too far off topic from Lotsawa.

jeffreykegler commented 1 year ago

@oscar-benderstone It is up to @dabrahams whether we keep the discussion here or not. If he'd like this subtopic to move, the best place might be Marpa's IRC channel. For the moment, I'll just pursue the discussion where it is.

Answering the questions:

Are there ways to reformat the error messages in marpa? As a toy example, consider a language where every expression starts with "start." If the user is missing the word "start" in their expression, can I make an error message to say that (i.e., "Error: missing "start" at this location")?

This depends on the grammar. If 'start' is the only lexeme accepted at that point, Marpa will know it and tell you so. You can also use the "Ruby Slippers", supplying a 'start' lexeme automatically. If you don't know until later that 'start' must have been the lexeme, there are other techniques.

Is there unicode support for the parsers? This may be more of a perl-oriented question. If possible, I would like to support human languages, emojis, and any unicode supported symbols.

Perl has the best Unicode support in the business I am told, and Marpa supports Unicode, which means it suffers a bit in benchmarks, because its competition typically assumes the much faster ASCII.

Is there a way to create my own data structures for other programming languages? I would be fine having marpa output the data structure as a string and using that in, say, rust. (Alternatively, I wouldn't mind perl to analyze my language and create a cli with it). I am also not familiar with how ASTs for marpa are generated.

This is really a Perl question. If output as text is sufficient for interchange, you should be fine.

chumsky looks like a parser combinator language. Btw, if you want a summary of all the parsing techniques out there and their pro's and con's, you might be interested in my timeline of parsing.

oscar-benderstone commented 1 year ago

@jeffreykegler Thank you for answering my questions. I will use marpa then; I'm excited to integrate it in my project! I will be sure to see the IRC chat (just in case).

I am also excited to see where Lotsawa leads (and have it in mind for a future project).

jeffreykegler commented 1 year ago

@oscar-benderstone -- thanks! I hope you find Marpa::R2 satisfactory. Btw I think Marpa is also suitable for prototyping and experiments. You can work out your grammar and then move it if you later decide another environment is more suitable. Marpa is good at taking anything that is thrown at it, while chumsky might be feeling kind of restrictive by now.

oscar-benderstone commented 1 year ago

Yes, thank you for making marpa! It looks like an incredible tool, and I agree that it is less restrictive than chusmky. While I like some aspects of chumsky (parser combinators are nice to work with), BNF parsing is not directly implemented. It does require some work beforehand, but I am glad the functionality is available in marpa.

jeffreykegler commented 1 year ago

@oscar-benderstone -- thanks! I hope you find Marpa::R2 satisfactory. Btw I think Marpa is also suitable for prototyping and experiments. You can work out your grammar and then move it if you later decide another environment is more suitable. Marpa is good at taking anything that is thrown at it, while chumsky might be feeling kind of restrictive by now.