halaxa / json-machine

Efficient, easy-to-use, and fast PHP JSON stream parser
Apache License 2.0
1.08k stars 65 forks source link

Performance improvements (lexer/parser) #46

Closed fcaps closed 1 year ago

fcaps commented 3 years ago

Hi guys,

the memory usage is awesome but the cpu-time is ~100x compared to json_decode (100MB json with 10000 entries). Did you consider using a c-extension for the tokenizing/parsing? Never wrote a extension, but looks like we could extend ext-json or even just use ext-parle for the heavy lifting.

Could try to implement a lexer with ext-parle and look how the performance changes and then implement a parser if you guys think this is a good idea.

Greeting

halaxa commented 3 years ago

Hi,

I was considering that and looked into it a little bit but I do not have the knowledge. I was fiddling with it in zephir branch. Learning to code extensions in pure C would be fun and is appealing to me but I do not have the time to do it right now. I am open to that if anyone else does.

halaxa commented 3 years ago

I was looking into Parle earlier. I am worried about the lack of documentation. Is it even able to consume the source lang (json) chunks iteratively? If it is feel free to write a prototype. A lexer producing json tokens which would be interchangable with JsonMachine\Lexer might be a good start.

fcaps commented 3 years ago

sorry, was busy^^ had a deep look into parle, yeah... no documentation for PHP, but there is some for the original c implementation lexertl. The first working lexer working with parle was terrible x2 slower compared to the pure php implementation. The second lexer was "state aware" and was much faster, but at this stage it's almoast a parser.

Current State:

Open Questions:

Next Steps:

halaxa commented 3 years ago

You seem dedicated :) Can you show some code?

Are you using tests to verify correctness?

I would elaborate on other topics when we see a significant impact on performance.

Remo commented 3 years ago

While looking for a faster alternative, I found this https://github.com/shevron/ext-jsonreader It's written in C and offers streaming as well. I haven't done any testing, but before you guys start working on something new you might want to check it out. Unfortunately it's quite old, but it might still give you a head start.

halaxa commented 1 year ago

Let's contiune in #97