Open xakepp35 opened 1 year ago
can we please do 2 changes?
add a funcction
func (c *Code) RunWithContextUnnormalized(ctx context.Context, v any, values ...any) Iter {
switch {
case len(values) > len(c.variables):
return NewIter(&tooManyVariableValuesError{})
case len(values) < len(c.variables):
return NewIter(&expectedVariableError{c.variables[len(values)]})
}
return newEnv(ctx).execute(c, v, values...)
}
capitalize NormalizeNumbers
so it can be used by client - this one isnt very important because we can copy source code of normalize numbers, but without first change we cant proceed..
Let me think carefully about changing (or adding) the API. Delaying the normalization could introduce undesired overhead due to increasing count of number conversions (e.x. echo '[0,1,2,3,4]' | gojq '[.[] * .[] * .[]]'
), also messes the implementation to normalize here and there. I admit that the number normalization could bother some package users, but currently README.md clearly mentions about this limitation; do not give values sharing same data between multiple calls
. However, this is the initial design, and we may improve this.
Delaying the normalization could affect the package users adding custom internal functions with gojq.WithFunction
.
Possible to make it a gojq.WithInputNormalize(<bool>)
compile option etc? this would also apply to gojq.WithVariables
values i guess?
Its not nesesary "delaying".. in some scenarios its opposite - normalisation can be done in advance (caching is done on user side). In our scenario, for example, we have smart cache, which tries to do "best effort".. initially it stores data as messagepack'ed []byte
payloads. If a request happens to some data part - it does unmarshalling to interface{}
, which is now followed by immediate normalisation, and then caching the resulting object. Next jq operations are just reusing cached object without slow unmarshal or normalisation, while calls made from parallel goroutines serving requests.
Second good option (and basically even more preferred one) may be to expose NewEnv(ctx)
and Execute
funcs - then you can keep api, just allowing some lower level access. That way we will optimize even more. we have multiquery, and query injections, so basically we want to create env once and then call Execute, Execute, .. several times, providing precompiled Code
from compiler cache, which we also have :-) because compile stage is slow.. thete are alot of things to make it work in performant manner, saving allocations. But the key is to open up some minor amount of internals, offering ability to operate withour reallocations, under some restricted conditions.. Indeed, Execute
method - is obviously and logically a method of env
VM, and not a Code
bytecode, and its obvious that its unsafe to run parallel executes on same vm at same time, but call Execute
in a sequene without reallocating env
- is something that we definetely want to acheive
modularity is good - why not doing that? if i need ill call. if i dont need - I wont call and save save save ticks!
func (c *Code) ValidateInput(v any, values ...any) error {
switch {
case len(values) > len(c.variables):
return &tooManyVariableValuesError{}
case len(values) < len(c.variables):
return &expectedVariableError{c.variables[len(values)]}
default:
return nil
}
}
typically good thing is to have both high level (for beginners/simple use) and lower level available.. we started to demand lower one, and agree to take responsibilities to make it work good..
btw, running 2 Execute()
in a row on a same vm , preallocated by NewEnv(ctx)
- does not work, second run is not updating the output :-(
PS. @wader - we are now removing usage of gojq.WithVariables
because it requires compiler stage - replacing it with custom function, obraining variables from context at run time
sorry for such many info here, its actually worth multiple issues at once, all of them are arising when you re trying to make it perform fast, processing thousands and thousands of queries (of same format but with different arguments) per moment in a cloud env - you just want to reuse whatever theoretically can be reused!
At this moment, I have no good idea to solve this issue without increasing the number of parsing or breaking the custom internal function APIs. Also, I think in the most use case of this package (like gh), only one querying is executed against one input, so I can't imagine specific application that needs to reuse the same input value.
I actually face the same issue last year when using jq
as a script language in an internal system.
I just patch JQ, and manually copy map
and slice
instead.
However, it creates a lot of memory and increase GC pressure.
https://github.com/reyoung/gojq/commit/19cb611aa3878c3faca43d83ba84cb1030659d6a
I have pushed an experimental fix for this issue to json-number
branch. You can install with go install github.com/itchyny/gojq@json-number
.
Sorry for such late response. Looks like that fix is working good. We switched to use of json-number
branch month ago. No issues were found with it after 1 month of testing.
@itchyny Would json-number
be merged into master? looks like it resolved our issue. We're using it in prod for 2 months, no problems were observed yet. Moreover, it allowed us to find and see the problems that we had with types on our side, because it started to preserve the type, thats good. Can we merge and close this one then?
This issue is blocked by https://github.com/go-yaml/yaml/issues/807.
It looks like
normalizeNumbers
function is a problem, when unavoidably called from RunWithContextPlease, check out my reproduction example on go.dev
Is there any option to export it and let package user to decide wherever to call it, or not at all?
To rephrase - RunWithContext should be able to perform as a "pure" function - produce output, while never touching user input at all, because you cant know, if its safe to do any modification with it
Can RunWithContext behave as a "copy-on-write" executor, along whole execution path?