Closed langfield closed 1 year ago
Issue has been reproduced in a test below. Looks to be a simple fix in the grammar! 😀
~Okay, looking back at this, the way I did this was very stupid.~ (Or at least, it looks that way at first glance.)
~What I will try is instead getting rid of the custom parser, because we're just parsing markdown, and instead parse with mistletoe
, and then walk the AST to validate it against the Ki note format. It's silly that I am spending so much effort parsing ordinary features of markdown in addition to the Ki-specific stuff.~
Actually I think, the parser is fine. ~We just need it to be more lenient, and do all the validation in the transformer. Or possibly use a different parsing library. Wasn't there something that the @beartype people recommended?~ (It's called parsley
and it's not really what we're looking for.)
Okay, let's not do all the validation in the transformer, let's instead catch the UnexpectedToken
errors and then format them nicely with lark
's API. There's some sort of get_context()
function.
@SimonSelg, since you've asked for something concrete to work on, here is the grammar for notes.
If you haven't encountered this syntax before, checkout the JSON tutorial for Lark. Here, we define a field
as a fieldheader
followed by either an EMPTYFIELD
OR one or more FIELDLINE
s. (These are poorly named, because they're really more like newline-delimited paragraphs.)
If we have a field like this:
## My field
some text
The parser will fail, because the content (after the ## ...
line) is not an EMPTYFIELD
, but it also does not start with non-whitespace, and so it is not a FIELDLINE
either.
And actually, it's probably okay that the parser fails. I think the only issue is that the error message is not so nice. So I think we want to catch the errors when the parse
function is called, and then make them easily interpretable by the user. Check out this error API. We probably want a message like "Newlines are not allowed at the start of a field".
Separately, I'd like your opinion on if this is a bad idea, because it may be the case that we want to preserve the ability to "roundtrip" notes, i.e. they should be invariant under push and pull operations, and they might not preserve leading newlines created in Anki if we keep the grammar like this. What do you think?
Here is the card file itself.
guid: L>KHLS3F1w notetype: Basic
And here is the output.