Closed gercinojr closed 5 years ago
What are you trying to achieve? Parsing filenames?
What error are you getting, and what output do you expect?
Hello, yes the goal is to parse a list of file names (approximately 6500)
The error I'm getting is:
File "/home/gercino/.local/lib/python3.6/site-packages/lark/parsers/xearley.py", line 119, in scan
raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect for item in to_scan}, set(to_scan))
lark.exceptions.UnexpectedCharacters: No terminal defined for ' ' at line 1 col 7
pow r. toc h..mp3
^
Expecting: {Terminal('EXTENSION')}
I would like it to identify the file names (with or without dots in their names) as well as their extensions.
thanks for your attention.
Without looking too closely, I'm guessing that the bug is that right now EXTENSION is mandatory.
Try:
filename : NAME ["." EXTENSION]
Also, why are r.
and h..mp3
legal filenames?
hi... the code has two filenames
filename1 = "flaming.mp3"
filename2 = "pow r. toc h..mp3" # just one filename
# pink floyd/1967 - the piper at the gates of dawn/05 - pow r. toc h..mp3
I changed the grammar to
start : filename
filename : NAME ["." EXTENSION]
EXTENSION : "mp3" | "wav" | "flac" | "wma" | "ogg"
CHAR : /[a-zA-Z0-9.]/
WORD : CHAR+
NAME : WORD (" " WORD)*
'''
and the result was
Tree(start, [Tree(filename, [Token(NAME, 'flaming.mp3')])])
Tree(start, [Tree(filename, [Token(NAME, 'pow r. toc h..mp3')])])
the extension is no longer being recognized
The first version of code worked on another version of Lark. I do not remember now ... but after I updated Python and Lark the code stopped working. The problem is that the "dot" before the extension is being consumed as if it were in the file name. In the older version the "dot" was recognized as the separator between the filename and the extension. But not now. :-(
it is as if the model parser="earley" was working as parser="lalr".
@gercinojr Yes, I understand now. This happened because of a change I added to the default Earley behavior, which was intended to fix a performance issue.
However, the old behavior is still available under a special lexer:
grammar = '''
start : filename
filename : NAME "." EXTENSION
EXTENSION : "mp3" | "wav" | "flac" | "wma" | "ogg"
CHAR : /[a-zA-Z.]/
WORD : CHAR+
NAME : WORD (" " WORD)*
'''
# parser
p = Lark(grammar, parser="earley", lexer="dynamic_complete")
This should work.
I should also note that you can achieve this exact task using only regexps, in case you're inclined to try. Either way, I hope this helps.
Hello again! :-) I tried the solution you suggested but another type of error appeared. So I wrote a new file with three lines of code and the error was repeated.
new code to test error:
from lark import Lark
grammar="""
start: /a-z/
"""
p = Lark(grammar, parser="earley", lexer="dynamic_complete")
error:
Traceback (most recent call last):
File "test.py", line 5, in <module>
p = Lark(grammar, parser="earley", lexer="dynamic_complete")
File "/home/gercino/.local/lib/python3.6/site-packages/lark/lark.py", line 165, in __init__
self.parser = self._build_parser()
File "/home/gercino/.local/lib/python3.6/site-packages/lark/lark.py", line 188, in _build_parser
return self.parser_class(self.lexer_conf, parser_conf, options=self.options)
File "/home/gercino/.local/lib/python3.6/site-packages/lark/parser_frontends.py", line 122, in __init__
super(self).__init__(*args, complete_lex=True, **kw)
TypeError: super() argument 1 must be type, not XEarley_CompleteLex
I researched a bit on google for a solution and found this in stakoverflow:
Your problem is that class B is not declared as a "new-style" class. Change it like so:
class B(object):
and it will work. super() and all subclass/superclass stuff only works with new-style classes. I recommend you get in the habit of always typing that (object) on any class definition to make sure it is a new-style class.
this was what was written in the most voted answer.
lark-parser: 0.6.5 python: 3.6.6
You're right, sorry, it's still an issue. I'm planning to release a fix for this soon. Meanwhile, you can try the 0.7b
branch, where it should already be working.
Thank you for your attention. I'll wait for the new version to be released. :-)
This should be working now in master
. Feel free to re-open if there's still an issue
lark-parser: 0.6.5 python: 3.6.6 my code is not working... can anybody help me?
even if I change the grammar to ...
adding a dot... errors again :-(