lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.64k stars 397 forks source link

File input to `parse` method gives TypeError: object of type '_io.TextIOWrapper' has no len() #1375

Closed iainelder closed 7 months ago

iainelder commented 7 months ago

Describe the bug

I want Lark to read the file for me so that I don't have to read the whole file into memory first.

In #488 @erezsh showed that Lark's parse method takes a file argument so that Lark can read the file as it needs to.

What I think you're asking for, which is to parse a large file without having to load all of it into memory, is something that Lark already does.

  1. Python automatically buffers files as they are being read

  2. Lark supports the transformer=... option, which applies a transformer as the text is being parsed, rule by rule, instead of building a whole tree first. You don't have to create a tree, or store the parsed data, if you choose to.

All you have to do is something like this:

parser = Lark.open("my_grammar.lark", parser="lalr", transformer=MyTransformer())
with open("my_input.json") as f:
    result = parser.parse(f)

If for some reason that doesn't work, open a new issue and we'll fix it.

I use this complete example to try to reproduce it. I'm not ready to use a transformer yet, so I omit that part.

from lark import Lark
from pathlib import Path

grammar_path = Path("my_grammar.lark")
grammar_path.write_text("""
start: NUMBER*
%import common.NUMBER
%import common.WS
%ignore WS
""")

input_path = Path("my_input")
input_path.write_text("1 22 333")

parser = Lark.open(grammar_path, parser="lalr")
with open("my_input") as f:
    result = parser.parse(f)

I expect the result to be a tree with tokens for 1, 22, and 333.

Instead I get the following error:

TypeError: object of type '_io.TextIOWrapper' has no len()

Despite that example code, it looks like Lark really expects a string, which has a len method.

How do I get Lark to read the file for me?

MegaIng commented 7 months ago

I don't know where @erezsh got that from, Lark has never supported passing in a file object (or any IOStream) to .parse. This is also quite hard to add since we are using re, which can't deal with those objects directly, I can try to add support for it.

Ofcourse, if the file is smaller than a few MB, just loading it into memory is the better choice for speed.

erezsh commented 7 months ago

@erezsh showed that Lark's parse method takes a file argument so that Lark can read the file as it needs to.

I don't think that I did.

MegaIng commented 7 months ago

@erezsh showed that Lark's parse method takes a file argument so that Lark can read the file as it needs to.

I don't think that I did.

You did, the quote is exactly what you wrote.

erezsh commented 7 months ago

@iainelder My bad, I was wrong, and I apologize for misleading you. You should read the file yourself, and provide it to Lark as a string.

If you need to parse a very large file, there might be a way to do so incrementally, but a solution isn't yet written or tested. See #1211 .

iainelder commented 7 months ago

If the file is smaller than a few MB, just loading it into memory is the better choice for speed.

My data is small enough that I can work like this.

Thanks for clarifying. The documentation does show that parse expects a string, but that issue ranks high on Google :-)