Open mnmkng opened 3 years ago
Hmmm, immediately after submitting this I got an idea that the label inference might be the culprit and it surely is:
plain js took 18.78ms on average.
ow took 32.62ms on average.
function validateWithOw(objects) {
const predicate = ow.object.exactShape({
foo: ow.string,
one: ow.number,
obj: ow.object,
})
objects.forEach(obj => {
ow(obj, 'x', predicate) // <==== added some label
})
}
Another potential improvement lies in the execution of ow.object.partialShape()
and exactShape()
functions. It seems from testing that no matter how many properties there are in the validated object, those functions still iterate over all the properties in the provided shape
.
Example of our validation:
const predicate = ow.object.exactShape({
url: ow.string.url,
id: ow.optional.string,
loadedUrl: ow.optional.string.url,
uniqueKey: ow.optional.string,
method: ow.optional.string,
payload: ow.optional.any(ow.string, ow.buffer),
noRetry: ow.optional.boolean,
retryCount: ow.optional.number,
errorMessages: ow.optional.array.ofType(ow.string),
headers: ow.optional.object,
userData: ow.optional.object,
handledAt: ow.optional.any(ow.string.date, ow.date),
keepUrlFragment: ow.optional.boolean,
useExtendedUniqueKey: ow.optional.boolean,
});
ow(options, 'RequestOptions', predicate);
By refactoring to the custom validation loop below, we got a 3-4 times speed increase.
const requestOptionalPredicates = {
id: ow.optional.string,
loadedUrl: ow.optional.string.url,
uniqueKey: ow.optional.string,
method: ow.optional.string,
payload: ow.optional.any(ow.string, ow.buffer),
noRetry: ow.optional.boolean,
retryCount: ow.optional.number,
errorMessages: ow.optional.array.ofType(ow.string),
headers: ow.optional.object,
userData: ow.optional.object,
handledAt: ow.optional.any(ow.string.date, ow.date),
keepUrlFragment: ow.optional.boolean,
useExtendedUniqueKey: ow.optional.boolean,
};
ow(options, 'RequestOptions', ow.object);
ow(options.url, 'RequestOptions.url', ow.string.url);
Object.keys(options).forEach((prop) => {
const predicate = requestOptionalPredicates[prop];
const value = options[prop];
if (predicate) {
ow(value, `RequestOptions.${prop}`, predicate);
// 'url' is checked above because it's not optional
} else if (prop !== 'url') {
const msg = `Did not expect property \`${prop}\` to exist, got \`${value}\` in object \`RequestOptions\``;
throw new Error(msg);
}
});
Would it be possible for partialShape()
and exactShape()
to only validate existing properties instead of all shape
properties? I assume the reasoning behind validating all is to capture getters and setters. Could we have an option to configure this?
Hmmm, immediately after submitting this I got an idea that the label inference might be the culprit and it surely is:
Yeah, it's slow because it needs to read the file from disk and parse it. Label inference is only called when something fails though, which should not be that often for normal function parameter validations calls, which is what Ow was originally intended for. I guess we could add a Performance
section that outlines the trade-offs.
Would it be possible for partialShape() and exactShape() to only validate existing properties instead of all shape properties? I assume the reasoning behind validating all is to capture getters and setters. Could we have an option to configure this?
I think this is just an oversight that could be fixed. I don't see how it matter if they're getters/setter.
The reason it loops over all the properties of the shape instead of all the properties of the input object is because it expects all properties of the shape to be present in the input object. If you turn it around, you can't detect that property unicorn
does not exist in the input object for instance.
Yeah, it's slow because it needs to read the file from disk and parse it. Label inference is only called when something fails though, which should not be that often for normal function parameter validations calls, which is what Ow was originally intended for. I guess we could add a
Performance
section that outlines the trade-offs.
@sindresorhus In the provided example, nothing fails, yet the speed of validation improves 10-fold without inference. Maybe a bug somewhere and the inference is always called?
And yes, we use ow
for function parameter validation calls. The call looks like this:
await requestQueue.addRequest({ url: 'https://example.com });
Only url
is mandatory. All other props are optional.
The reason it loops over all the properties of the shape instead of all the properties of the input object is because it expects all properties of the shape to be present in the input object. If you turn it around, you can't detect that property unicorn does not exist in the input object for instance.
@SamVerschueren Yes, that's right. But for optional
properties there's no need to check for non-existence. I apologize, I thought I mentioned this in the comment, but apparently I did not.
I understand this may be an edge case in your opinion, but I think it's quite typical for functions to accept options-objects or "named parameters", where one or two are mandatory and the rest is optional.
Hi @sindresorhus @SamVerschueren just wanted to bring to your attention that the validations above are about 15 times slower in ow@0.22.0
than in previous versions.
I might be able to work on this, at some point this quarter so if it's not critical for you, let me handle this in February or so. We'll keep using 0.21.0
for the time being.
// @vladfrangu FYI
@mnmkng That would be really great. 🙏
That's... odd, that shouldn't be the case, unless generating the error message causes this slowdown (which would probably make sense if your schema is very complex and there's a LOT of errors)...
Any benchmarks or performance reports you could share? 🙏 Reproduction samples that I could benchmark? 😓
@vladfrangu See the thread above for examples.
In real life, it's this test: https://github.com/apify/apify-js/blob/11ff217a2af12cbb900b6de827de710dad030786/test/request.test.js#L123 On 0.22 it runs in 450+ms for me locally, whereas with 0.21 it runs in ~30ms.
Will try to shoot an eye later this week, could you also link me the schemas you have, please? 🙏 I haven't noticed such a big speed downgrade locally... but my gut says its the message generation thats causing the biggest slowdown
Sure, the predicates are here: https://github.com/apify/apify-js/blob/11ff217a2af12cbb900b6de827de710dad030786/src/request.js#L19
And the validation code is here: https://github.com/apify/apify-js/blob/11ff217a2af12cbb900b6de827de710dad030786/src/request.js#L107
We're using the workarounds that I explained in the above comments to speed up the validation.
@mnmkng so.. I'm in a bit of a predicament right now, as this issue completely amazes me.
Since you mentioned there's a possible performance regression in 0.22.0 (ala the "report all errors in the message" version), I reset my local master branch of ow to upstream, cloned apify, nuked all tests, except the one you mentioned, and ran it. And ran it again, and again... you get the gist. I've tinkered with ow's code, and have shaved off some ms and duplicated iterations, I've tried insane things like moving from maps to objects (and back)...
So, then I decided to time it DIRECTLY in code (console.time/timeEnd, Date.now subtraction, etc). ow takes... literally between 0 and 7 ms per class constructed to validate that options is an object and options.url is a string. First run was ~7ms, subsequent were very VERY fast (like, between 0 and 2ms)... So then I don't understand why Jest is reporting it's taking almost a full second to create 1k classes and push them to one array... In fact, if I add up all the results from it. I get around 566
(still a lot mind you)
So then I decided to implement the same similar "benchmark" code as provided in the main issue
This yielded some more consistent results, aaand... honestly, I'm lost. After removing the console.logs which caused slowdown... Results are...
Version | Jest Report | Duration Avg (from code above) |
---|---|---|
0.21.0 | 69ms | 23.653 |
0.22.0 | 521ms | 266.364 |
0.23.0* | 523ms | 271.534 |
- 0.23.0 is the local version I have with changes to reduce duplicate iterations
At this point, I don't even know if it IS a regression or just Jest and benchmarking not being meant for each other... I've tried creating the classes themselves within the benchmark script you provided above, changing the 50 iterations to 1k, making v8 warmup by running the benchmark function 100k times before actually benchmarking it and...
Version | Bench Report |
---|---|
0.21.0 | 4.75ms |
0.22.0 | 65.645ms |
0.23.0* | 65.245ms |
If you have any other ideas of what I should try, or.. want to experiment privately, you can contact me on Discord (if you use that, Vladdy#0002) or here! But as it stands... I'm lost..
@vladfrangu Thanks for all the work you clearly dedicated to this, but I'm not sure how Jest is relevant. Your tables clearly show that there's a ten-fold or larger slowdown between 0.21.0 and 0.22.0 (or 23).
Yes, the Jest numbers are higher, because Jest instruments code and does whatever magic, but the issue is exactly the same without Jest, as your own numbers prove. I.e. from 23ms to 271ms in one test, or from 5ms to 65ms in the other. That's a pretty big slowdown.
Please correct me if I'm missing something here.
Oh..my..god.
Found the culprit. Turns out, generating the stack trace when it's not needed is QUITE AN EXPENSIVE OPERATION!! (aka I dropped from 52.73ms to 3.239ms by just making the function return null) Expect a PR soon!
Are there some low hanging fruits to improve performance?
ow
seems to be about 15 times slower in validation than pure JS. I understand that some slowness is obviously expected, but 15 times slower makes me think this could be a bug rather than a design limitation.I'm using
ow@0.20.0
Repro code:
It is a problem for us because we use
ow
to validate user provided objects (constructor options) and the number of those objects can be hundreds of thousands in one batch.ow
is by far the slowest part of that process.I'd gladly submit a PR if you can point me in some general direction.
Thanks.