objeck / objeck-lang

Objeck is a modern object-oriented programming language with functional features tailored for machine learning. It emphasizes expression, simplicity, portability, and scalability. The programming environment consists of a compiler, virtual machine, REPL shell, and command line debugger with IDE plugins.
https://objeck.org
Other
152 stars 11 forks source link

Implement the JSON parser in C++ #432

Closed ghost closed 2 months ago

ghost commented 7 months ago

The Objeck wrapper would be identical to the old API, so nothing will break. This is also the chance for you to see if it's your algorithm or the Objeck VM that is really the faulty one for the poor performance of the current JSON parser written in Objeck.

Please consider it. Thank you.

objeck commented 7 months ago

This issue is with very large JSON files. The bottleneck is parsing the text into JSON elements. Both file load and element search are near realtime. I will look into this more.

ghost commented 7 months ago

This issue is with very large JSON files.

I don't consider a 4.29 MB JSON file to be very large. Still remember the Adept program on #430? It has no problems handling a 60 MB JSON file. The Vala program crashed, though. GLib sucks.

The bottleneck is parsing the text into JSON elements. Both file load and element search are near realtime. I will look into this more.

So, it's your algorithm. I don't think I'm qualified to talk about algorithms, but I will do so anyway. The Adept program is so fast because it only parses what it should parse. The Objeck program is so slow because it tries to parse everything.

https://github.com/AdeptLanguage/AdeptImport/blob/master/2.8/JSON.adept

objeck commented 7 months ago

Understand that the JSON tested was from web service calls.

objeck commented 7 months ago

The JSON parser does not utilize stream/incremental parsing; thus, the performance hit when parsing large documents. I am looking into stream processing as a POC.

ghost commented 7 months ago

Could you try to change from using Stack to Vector to see if it improve the performance?

https://github.com/objeck/objeck-lang/blob/master/core/compiler/lib_src/json.obs

ghost commented 7 months ago

The JSON parser does not utilize stream/incremental parsing; thus, the performance hit when parsing large documents. I am looking into stream processing as a POC.

Hope you can explore new possibilities for language features in the process.

ghost commented 7 months ago

Btw, this is what I will do if it's me. There are various C++ based JSON parsers on Github. I will try all of them and pick the one I liked most and make a binding for Objeck. Let's call it json2. The API doesn't have to resemble the original json. This will be not as fun as making your own JSON parser, but will always work.

ghost commented 6 months ago

@objeck Any plans to work on this?

objeck commented 6 months ago

Yes, it is next

objeck commented 6 months ago

Start of a proof-of-concept parser.

ghost commented 6 months ago

Start of a proof-of-concept parser.

How to use it?

objeck commented 6 months ago

Still working on it; however, it is progressing.

objeck commented 6 months ago

According to initial tests, using stream parsing significantly enhanced the performance of parsing large documents by up to 95%, depending on the document's structure. For instance, in the given examples, the parsing times were reduced from 4.642 seconds with tree parsing to 0.323 seconds with stream parsing, resulting in a 93% increase in performance.

Run Tree Stream
0 4.735 0.321
1 4.775 0.324
2 4.856 0.313
3 4.775 0.320
Avg 4.785 0.319

Tests: tree parser, stream parser, and input file.

Implementations: tree parser and stream parser.

objeck commented 6 months ago

Closing and moving to UAT

ghost commented 6 months ago

Closing and moving to UAT

What does UAT mean?

objeck commented 6 months ago

Closing and moving to UAT

What does UAT mean?

In test