kostya / benchmarks

Some benchmarks of different languages
MIT License
2.79k stars 253 forks source link

JSON with low memory consumption #232

Closed proyb6 closed 3 years ago

proyb6 commented 4 years ago

Just tried with jsparser, I realise it consume lowest memory possible, less than 9MB, probably avoid extra allocation iirc.

macOS Catalina Go version 1.14rc1 Timing to complete: <2.2s Memory: <9MB

package main

import (
        "os"
        "bufio"
        "fmt"
        "strconv"
        "github.com/tamerh/jsparser"
)

func main() {
f, _ := os.Open("/tmp/1.json")
br := bufio.NewReaderSize(f, 16384)
parser := jsparser.NewJSONParser(br, "coordinates").SkipProps([]string{"name", "opts"})
x, y, z := 0.0, 0.0, 0.0; len := 0.0

for json := range parser.Stream() {
        xx, _ := strconv.ParseFloat(json.ObjectVals["x"].StringVal, 64)
        yy, _ := strconv.ParseFloat(json.ObjectVals["y"].StringVal, 64)
        zz, _ := strconv.ParseFloat(json.ObjectVals["z"].StringVal, 64)
        x += xx
        y += yy
        z += zz
        len += 1.0
}
        fmt.Printf("%.8f\n%.8f\n%.8f\n", x/len, y/len, z/len)
}
nuald commented 4 years ago

The provided example reads from the file, not the memory. Please update it (or even better, send PR) with the code that reads into the memory first, and parse the JSON from the memory. The overall time could increase, but the measured time interval (for the actual JSON parsing) could decrease.

proyb6 commented 4 years ago

I see, would you be interested to PR instead?

nuald commented 4 years ago

Yes, please. As a guide, the PR for the new tests usually includes:

proyb6 commented 4 years ago

Sorry, haven’t have the time to follow up, I hope you could PR?

nuald commented 4 years ago

PR #236 - Please note that is has bigger memory consumption because it reads the file from the memory (as all other tests), not from the file system directly. As for the performance, it doesn't beat other Go tests, but rather it's the slowest among them, so I have some doubts about including it into the benchmarks. However, if you wish, I'll merge the PR into master.

proyb6 commented 4 years ago

In my opinion, if this can indicate as "read file from OS" or a separate JSON benchmark, otherwise, we can ignore the PR.

tamerh commented 4 years ago

Hi @proyb6 and @nuald

I recently noticed this issue and made some improvements and it is now faster and more efficient using avarage 5mb memory. It could be improved more but for now probably enough. It doesn't needed jsparser in your benchmarks but want to add few comments,

Most of the exisiting libraries including simdjson load all the file into a memory which gives a lot flexibility for fast parsing but requires big memory for large files and you need to wait all the parsing done for processing the data. My usecase was more suitable for stream parser that's why I wrote the parser.

Your benchmark counts total memory usage if you taken into acount avarage memory usage then jsparser would probably stand somewhere in the top for avarage memory via using buffered reader only.

I impressed simdjson and simdjson-go via your benchmarks thanks for this. They have plan to implement Stream parsing in the future. Let's see how it will works out, probably no need to jsparser when implemented then I would also switch to it.

beached commented 4 years ago

Even if you use memory mapping, which is essentially streaming, and doesn't really use much memory if it isn't there, as it relies on the OS paging, the measurement looks very similar with the measurement being done, at least a few months ago.

Another approach might be to take the memory prior to parsing after the file has been loaded as this will show parsing memory which seems like the goal.