minio / simdjson-go

Golang port of simdjson: parsing gigabytes of JSON per second
Apache License 2.0
1.8k stars 85 forks source link

Clarification on WithCopyStrings() #62

Closed flowchartsman closed 2 years ago

flowchartsman commented 2 years ago

It's not clear from the docs what the limitations of WithCopyStrings(false) are. Specifically, the godoc says:

WithCopyStrings will copy strings so they no longer reference the input. For enhanced performance, simdjson-go can point back into the original JSON buffer for strings, however this can lead to issues in streaming use cases scenarios, or scenarios in which the underlying JSON buffer is reused.

What are these circumstances, and what are the best-practices needed to avoid them? I'm assuming that this is mostly referring to retaining references to strings after the next document has been loaded in an NDJson situation, but is it safe to use these strings, provided they are copied? It would be very helpful to have this called out in the docs.

klauspost commented 2 years ago

When input is provided as a []byte the returned ParseJson will reference the input data.

For NDJSON, when using ParseND(b []byte, ...) the returned structure will reference b. For ParseNDStream it is safer, and you only need to not reference whatever you send back on the reuse channel.