langfield / ki

version control for Anki collections
https://langfield.github.io/ki/
GNU Affero General Public License v3.0
70 stars 3 forks source link

Parser fails on a `Basic` card for an unknown reason #129

Closed langfield closed 1 year ago

langfield commented 1 year ago

Here is the card file itself.

# Note

guid: L>KHLS3F1w notetype: Basic


### Tags

## Front

Edit the layout/card types from the editor or browser

## Back

Ctrl+L

And here is the output.

(anki) mal@computer:~/collection$ ki push
/home/mal/collection
Notes: 100%|████████████████████████████████| 502/502 [00:00<00:00, 8570.47it/s]
Media: 100%|███████████████████████████████| 502/502 [00:00<00:00, 50952.00it/s]
Fields: 100%|███████████████████████████████| 502/502 [00:00<00:00, 4267.59it/s]
Decks: 100%|██████████████████████████████████████| 9/9 [00:00<00:00, 59.77it/s]
Pushing to '/home/mal/.local/share/Anki2/User 1/collection.anki2'
========================
Note change types
------------------------
ADD                    0
DELETE                 0
MODIFY                80
RENAME                 0
TYPE CHANGE            0
========================
Traceback (most recent call last):
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/lark/lexer.py", line 528, in lex
    yield lexer.next_token(lexer_state, parser_state)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/lark/lexer.py", line 466, in next_token
    raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
lark.exceptions.UnexpectedCharacters: No terminal matches 'E' in the current parser context, at line 13 col 1

Edit the layout/card types from the edit
^
Expected one of:
        * FIELDSENTINEL

Previous tokens: Token('EMPTYFIELD', '\n')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mal/conda/envs/anki/bin/ki", line 33, in <module>
    sys.exit(load_entry_point('ki', 'console_scripts', 'ki')())
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "<@beartype(ki.push) at 0x7fcb0fa4e040>", line 10, in push
  File "/home/mal/ki/ki/__init__.py", line 1899, in push
    return write_collection(deltas, models, kirepo, parse, head_kirepo, con)
  File "<@beartype(ki.write_collection) at 0x7fcb0fa3bf70>", line 124, in write_collection
  File "/home/mal/ki/ki/__init__.py", line 1949, in write_collection
    _ = set(map(warn, F.cat(map(push_fn, map(parse, deltas)))))
  File "<@beartype(ki.parse_note) at 0x7fcb0fa223a0>", line 72, in parse_note
  File "/home/mal/ki/ki/__init__.py", line 407, in parse_note
    tree = parser.parse(delta.path.read_text(encoding=UTF8))
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/lark/lark.py", line 625, in parse
    return self.parser.parse(text, start=start, on_error=on_error)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/lark/parser_frontends.py", line 96, in parse
    return self.parser.parse(stream, chosen_start, **kw)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/lark/parsers/lalr_parser.py", line 41, in parse
    return self.parser.parse(lexer, start)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/lark/parsers/lalr_parser.py", line 171, in parse
    return self.parse_from_state(parser_state)
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/lark/parsers/lalr_parser.py", line 188, in parse_from_state
    raise e
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/lark/parsers/lalr_parser.py", line 178, in parse_from_state
    for token in state.lexer.lex(state):
  File "/home/mal/conda/envs/anki/lib/python3.9/site-packages/lark/lexer.py", line 537, in lex
    raise UnexpectedToken(token, e.allowed, state=parser_state, token_history=[last_token], terminals_by_name=self.root_lexer.terminals_by_name)
lark.exceptions.UnexpectedToken: Unexpected token Token('ANKINAME', 'Edit the layout/card types from the editor or browser') at line 13, column 1.
Expected one of:
        * FIELDSENTINEL
        * $END
Previous tokens: [Token('EMPTYFIELD', '\n')]
langfield commented 1 year ago

Issue has been reproduced in a test below. Looks to be a simple fix in the grammar! 😀

https://github.com/langfield/ki/blob/91c295b6ffe27c441ab7e13fc22d4a0a9b5f1e50/tests/test_parser.py#L844-L871

langfield commented 1 year ago

~Okay, looking back at this, the way I did this was very stupid.~ (Or at least, it looks that way at first glance.)

~What I will try is instead getting rid of the custom parser, because we're just parsing markdown, and instead parse with mistletoe, and then walk the AST to validate it against the Ki note format. It's silly that I am spending so much effort parsing ordinary features of markdown in addition to the Ki-specific stuff.~

Actually I think, the parser is fine. ~We just need it to be more lenient, and do all the validation in the transformer. Or possibly use a different parsing library. Wasn't there something that the @beartype people recommended?~ (It's called parsley and it's not really what we're looking for.)

Okay, let's not do all the validation in the transformer, let's instead catch the UnexpectedToken errors and then format them nicely with lark's API. There's some sort of get_context() function.

langfield commented 1 year ago

@SimonSelg, since you've asked for something concrete to work on, here is the grammar for notes.

https://github.com/langfield/ki/blob/91c295b6ffe27c441ab7e13fc22d4a0a9b5f1e50/ki/grammar.lark#L10-L17

If you haven't encountered this syntax before, checkout the JSON tutorial for Lark. Here, we define a field as a fieldheader followed by either an EMPTYFIELD OR one or more FIELDLINEs. (These are poorly named, because they're really more like newline-delimited paragraphs.)

If we have a field like this:

## My field

some text

The parser will fail, because the content (after the ## ... line) is not an EMPTYFIELD, but it also does not start with non-whitespace, and so it is not a FIELDLINE either.

And actually, it's probably okay that the parser fails. I think the only issue is that the error message is not so nice. So I think we want to catch the errors when the parse function is called, and then make them easily interpretable by the user. Check out this error API. We probably want a message like "Newlines are not allowed at the start of a field".

Separately, I'd like your opinion on if this is a bad idea, because it may be the case that we want to preserve the ability to "roundtrip" notes, i.e. they should be invariant under push and pull operations, and they might not preserve leading newlines created in Anki if we keep the grammar like this. What do you think?