romshark / jscan

High performance JSON iterator & validator for Go
BSD 3-Clause "New" or "Revised" License
88 stars 7 forks source link

Rename named return variable in walker closure #21

Open flrdv opened 9 months ago

flrdv commented 9 months ago

By default, walker closure has the following signature:

func(i *jscan.Iterator[S]) (err bool)

IDE proposes it every time you're trying to use auto-complete. And I find the named return's name quite inconvenient because of the following reasons: 1) err variable is pretty common and is usually meant to carry a value of type error. This sometimes leads to name collisions 2) It doesn't quite respond to its semantics: it is more like some exit or stop

The preferred solution would be to replace bool with the actual error interface, so jscan.Scan would return it instead of error with code ErrorCodeCallback. However, it breaks backward capability, and according to the fact that v2 was released just recently, it definitely won't be implemented just now.

So my current solution is to rename it to some of the already mentioned variants: 1) exit 2) stop 3) break_ (seems ugly tbh)

I'm going to open a PR after the resolution

romshark commented 9 months ago

To understand why I don't return error, as is common in Go, consider the following benchmark (https://go.dev/play/p/_iAcdhVU8m7):

=== RUN   TestSizeOf
    bench_test.go:10: size of Error[string]: 32
    bench_test.go:11: size of Error[[]byte]: 40
--- PASS: TestSizeOf (0.00s)
goos: darwin
goarch: arm64
pkg: errbench
BenchmarkInterface
BenchmarkInterface-10           54839221                21.78 ns/op           32 B/op        1 allocs/op
BenchmarkStructString
BenchmarkStructString-10        1000000000               0.9313 ns/op          0 B/op        0 allocs/op
BenchmarkStructBytes
BenchmarkStructBytes-10         334166505                3.578 ns/op           0 B/op        0 allocs/op
PASS
ok      errbench        4.804s

Returning the Error[T] type carrying relevant information as error will allocate it on the heap. Using fmt.Errorf or similar would also allocate memory. Copying 32 bytes on modern 64-bit systems is essentially free and 40 bytes in case of []byte is comparably cheap. Therefore, I leave the decision of whether to use Error[T] as error to the API user as it does incur a cost.

If the walker closure would return error then other functions would need to return error too which I would rather avoid for the reason described above.

The fact that the named return err can be inconvenient is a fair point and renaming it would be backward compatible so that's possible. exit and stop are both good candidates, break_ is not.

P.S. I also thought about func(i *jscan.Iterator[S]) (errCode int8) when designing the API, but this could allow returning jscan error codes like ErrorCodeIllegalControlChar, which isn't ideal as those codes are supposed to be used by the parser only.

flrdv commented 9 months ago

When talking about performance, the user must take care of it by himself, too. So by that, I disagree that the returning error interface is that awful in terms of performance, as in my own usecase I use pre-defined static errors. However I agree that there are more usecases as I've seen by my own, and sometimes error text would be needed to be dynamic, but in this case, the user still needs to make an effort to avoid fmt.Sprintf or anything like that. In other words, the task is difficult by itself, and the approach currently used doesn't really help as far as I can see.

What about renaming the err - have we reached a consensus here, so I can open a PR?

romshark commented 9 months ago

Using jscan.Error[T] as error

but in this case, the user still needs to make an effort to avoid fmt.Sprintf or anything like that.

Not much effort required since jscan.Error[T] implements error, just wrap it:

var err error
if e := jscan.Validate(json); e.IsErr() {
  err = e
}

Or simply call the Error method to get the message allocated:

jscan.Validate(`{"x":_}`).Error() // "error at index 5 ('_'): unexpected token"

Rationale

Consider the following example and benchmark: https://go.dev/play/p/SxuovkZG2D2 IndexOfErrorInJSON needs to return either the index of where there's an error in the JSON or -1 if there was no error. Returning jscan.Error[T] actually makes the code both simpler and faster:

if err := jscan.Validate(j); err.IsErr() {
    return err.Index
}
return -1

If we were to return error from jscan.Validate instead, then we would need to either use a type assertion or errors.As, which are both less convenient and wasteful:

if err, ok := jscan.Validate(j).(jscan.Error[string]); ok {
    return err.Index
}
return -1
var jscanErr jscan.Error[string]
if errors.As(jscan.Validate(j), &jscanErr) {
    return jscanErr.Index
}
return -1
goos: darwin
goarch: arm64
pkg: errbench
Benchmark
Benchmark/concrete
Benchmark/concrete-10           21867732                49.74 ns/op            0 B/op          0 allocs/op
Benchmark/errors_as
Benchmark/errors_as-10           7709853               152.6 ns/op            96 B/op          3 allocs/op
Benchmark/typeassert
Benchmark/typeassert-10         14732881                72.35 ns/op           32 B/op          1 allocs/op
PASS
ok      errbench        3.712s

The fact that jscan.Validate returns jscan.Error gives you an option to avoid paying the extra cost when it's not justified.

A ~30ns difference and an extra allocation is a tiny cost. However, package jscan is deliberately designed to allow your code to remain allocation free.

I do agree that in Go it's typical to work with error and having to check for errors with if err.IsErr() may be a bit confusing, but jscan is not your typical Go package since its main focus is performance.

Renaming named return

What about renaming the err - have we reached a consensus here, so I can open a PR?

Yes, this is possible. I don't think we need a PR for that since it's a trivial change I can do myself but if you want - go for it 🙂

EDIT:

as in my own usecase I use pre-defined static errors

Sentinel values would only allow us to replace the error code, but if we also want to return the occurrence index then there's no way around allocation except returning an actual struct implementing the error interface.