Closed 6r1d closed 9 months ago
For the context, this is a full grammar I am experimenting with.
@@grammar::Markdown
@@whitespace :: /[␟]+/
start = pieces $ ;
newline = '\n';
text = text:/[a-zA-Z\d \-\_#:]+/ ;
raw_link_prefix = 'http://' | 'https://' ;
raw_link = protocol:raw_link_prefix url:link_string ;
internal_link = protocol:raw_link_prefix url:link_string ;
link_string = /[a-zA-Z\d\$\-\_\.\+\!\*\'\/\&\?\=\%]+/ ;
piece
=
| newline
| raw_link
| link
| code_inline
| bold
| italic
| text
;
pieces = {piece}*
;
link = '[' content:pieces '](' url:internal_link ')';
code_inline
=
mode:'`' content:text '`'
;
italic = mode:'*' content:pieces '*'
;
bold = mode:'**' content:pieces '**'
;
My goal is to also be able to parse Markdown like this:
[`issue_a#no`](https://example.com)
[`issue_b#no`](https://example.org)
Instead, the error I hit is also:
tatsu.exceptions.FailedToken: (1:1) expecting '\n' :
()
^
newline
piece
pieces
start
Sorry, but questions about learning to use TatSu, PEG, and parsing in general must go to StackOverflow
It doesn't look like a problem with learning either PEG, parsing or TatSu, but a genuine bug.
@apalala, if you think my EBNF above should ignore the [
and ]
, please explain how it can happen in the StackOverflow question I wrote to show my point. I am inclined to believe it's actually a bug in TatSu and I provided additional data for testing.
I'm not sure I would call it a bug, but there is a disconnect between how the @@whitespace
directive is documented and how it works: it is documented to take a regular expression, but it is interpreted to be a list of characters to skip over, which is translated into a regular expression here https://github.com/neogeny/TatSu/blob/0437dddb21417f724d150c5a9bfe74731d51fe1b/tatsu/buffering.py#L75-L87
I'm not sure I would call it a bug,
This is at least a bug in the documentation. I have been using parser generators for a while now, and I would say that if not a bug, this is at least (shall we say) bad API design (to say that it accepts a regular expression), and then provide a red herring.
If I were you, I'd re-open this issue and call it something like "better document @@whitespace
", and make everyone happy.
Blaming your users is not a good look, especially when the docs lie.
I apologize for not having paid closer attention to this report.
I apologize for not having paid closer attention to this report.
I'm glad TatSu got better in the end, and that's what matters :-)
I'm glad TatSu got better in the end, and that's what matters :-)
That matters :-)
But, unlike many others, the report contained a unit test. I should have just run it :-\
https://github.com/neogeny/TatSu/blob/master/test/grammar/directive_test.py#L42-L52
That matters :-)
I'm glad we all parted amicably.
A word of somewhat solicited advice from one maintainer to another.
If in doubt, keep the issue open. Even if the user is being actively hostile, they'll warm up to you, as soon as you show that you're on their side, and that you want to fix their problem, as much as they do.
Hello.
I'm starting to use TatSu and needed clarification about handling square brackets. TatSu tends to ignore them sometimes and recognize at different times for some reason.
I aim to render a subset of Markdown, but I'll start with a simplified grammar to discuss the issue. (I'm using a unit separator as a rare character, since other ways to disable whitespace handling were more confusing. If there's a more straightforward and reliable way to tell TatSu to recognize the whitespace as characters it should treat as a part of the text, that'll be useful to know, too.)
This is the test code which leads TatSu to ignore the
[]
, not fail with an error.If I set the
markdown_str
as something else, like()
or{}
, TatSu will fail. Individual square brackets,[
or]
, won't lead to an exception.Such parser behavior causes issues with recognising the Markdown URLs for me, so any help is welcome.