quinnj / JSON3.jl

Other
214 stars 47 forks source link

Tape reading without alloc #226

Open poncito opened 2 years ago

poncito commented 2 years ago

Hi,

This PR is not finished, just a POC to get the discussion started. So, the idea is to be able to read the tape without allocation. I created a benchmark with those results, where f0 just calls read, and f1 builds the tape, but then uses my non allocating code.

julia> @benchmark f0($str)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  995.800 ns … 174.571 μs  ┊ GC (min … max): 0.00% … 99.10%
 Time  (median):       1.025 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.144 μs ±   3.819 μs  ┊ GC (mean ± σ):  7.44% ±  2.21%

  ▅██▇▆▅▄▃▂▂▃▃▃▃▃▂▂▁▁    ▁                                      ▂
  ████████████████████████████▇▇▆▇▆▅▆▇▅▆▄▅▆▅▆▆▆▅▇▇▅▄▆▅▅▆▅▆▅▆▅▅▆ █
  996 ns        Histogram: log(frequency) by time       1.51 μs <

 Memory estimate: 2.58 KiB, allocs estimate: 11.

julia> @benchmark f1($reader, $str)
BenchmarkTools.Trial: 10000 samples with 201 evaluations.
 Range (min … max):  394.900 ns …  13.140 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     402.363 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   415.439 ns ± 162.861 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁█                                                             
  ██▅▆▄▂▂▃█▆▃▃▂▂▃▅▅▄▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  395 ns           Histogram: frequency by time          506 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

The implementation is completely non-mutable, and has two main concepts:

A few details about the implementation:

@quinnj Are you interested?

To pursue this PR, I would like to:

To go even further:

quinnj commented 2 years ago

Hey @Poncito! Thanks for opening a PR. Some really interesting ideas here. I've also been thinking here and there about how we can support more lazy/non-allocating queries/workflows. I'm playing with some ideas here. In truth, I'd like to move away from the tape all together, since that itself requires a big upfront allocation and to be held onto via lazy objects. Ideally, I'm hopeful we can figure out a purely functional solution where we/users could make custom JSONBase.AbstractContext to overload different parsing events (objectEnter, objectExit, etc). The idea is that we could then implement lazy querying via a custom context, in addition to regular parsing to a Dict, or even all the StructTypes integration we have in JSON3.jl.

Anyway, I'm not sure if I'll be able push forward on it much in the near term (depends on a few other projects going on), but I'd be happy to discuss some ideas more.

poncito commented 2 years ago

Hey @quinnj , thanks for your answer. I can dedicate a few hours a week of my free time to develop this kind of stuff. So if you have time to coach me on that, I could propose some code.

My understanding of JSONBase.jl and your explanation is the following:

Am I understanding correctly? I'm really not sure because of your comment on lazy objects. You could also mean that you would like to make the parsing lazy? I'm not sure what it would mean and where the performance would come.

My guess on how to efficiently parse and read a json would be the following dichotomy (and their corresponding 'AbstractContext'?),

From this dichotomy, it appears that not only the "storage" should be abstract, but also the "parsing" (that does not appear in your JSONBase.jl). It is required to exploit the schema. But it can also be used to relax some checks. For instance, if I want to take the risk to suppose that the serialized json is well formed, I could decide to replace

pos + 3 <= len &&
        b            == UInt8('t') &&
        buf[pos + 1] == UInt8('r') &&
        buf[pos + 2] == UInt8('u') &&
        buf[pos + 3] == UInt8('e')

by

b == UInt8('t')

I could also not read the keys, and directly jump ahead from the size of the expected key (and the next delimiter), ...

I can find some free time if you want to chat: romain.pct@gmail.com

Robert-j7 commented 1 year ago

Any updates on this?

I really would like to help since I really need it, the problem is that Im inexperienced, so I might cause some troubles :D

quinnj commented 1 year ago

I've been slowly chipping away at some ideas at https://github.com/quinnj/JSONBase.jl. I've gotten pretty far, but haven't been able to do really thorough benchmarking yet to make sure everything is squared away. There's also some polish, package admin, and testing to do, but I think the fundamentals are in pretty good shape. Happy to chat in that repo with anyone who's interested to collaborate and try things out with me.