vlang / v

Simple, fast, safe, compiled language for developing maintainable software. Compiles itself in <1s with zero library dependencies. Supports automatic C => V translation. https://vlang.io
MIT License
35.53k stars 2.14k forks source link

Iterative JSON Parser? #8986

Open Polydynamical opened 3 years ago

Polydynamical commented 3 years ago

In Python, there is a module "ijson" to parse giant JSON files. Is it possible to do the same with V?

nedpals commented 3 years ago

Can you send the link of that module here? I'm curious to see how it works. Maybe I can implement that in x.json2

JalonSolov commented 3 years ago

It's basically a JSON parser that reads from a stream and parses as it reads, instead of reading the whole file in and parsing it all at once.

Definitely something we need. Can use io.Reader interface for buffered input.

Polydynamical commented 3 years ago

@nedpals https://pypi.org/project/ijson/

@JalonSolov Yes, exactly. I use JSON files for NLP with Python but Python is slow. I am planning on transitioning it to V and this would be a great module to add.

dumblob commented 3 years ago

There are many parser & readers APIs in V which should be IMHO "iterative" by default (I call them "lazy views" with zero-copy semantics) as discussed e.g. in https://github.com/vlang/v/issues/1732#issuecomment-527823969 and other places (e.g. http response parsing incl. cookies etc.).

Unfortunately this "zero copy" (or "lazy parsing/reading" or "fragment parsing/reading") paradigm is not known much and people just disregard it to be more complicated (it's actually only slightly more complicated so this argument is a straw one), but the benefits are enormous in practise - that's why e.g. the gumbo HTML5 parser became so popular.

Another real-life example how to do "lazy/streaming parsing" well and at scale: https://github.com/cloudflare/lol-html (from Cloudflare).

nedpals commented 3 years ago

The RapidJSON page about SAX and iterative parsing gave me an idea on how I would implement this feature but I'll try to make it adaptable as much as possible to the current parser.

JalonSolov commented 3 years ago

Once you get it working it should likely be the default. Although keeping the current parsing as an alternative could be handy for when the size is known... especially when it is known to be small.