Closed fnorf closed 4 years ago
I had expected the bottleneck to be the time it took the APIs to answer but actually the data munching of hafas-client into its nice format seems to take the most resources.
Indeed, I would have expected the hafas-client
part to be negligible.
I made a single 24 hour departures request. ~370 trips were returned. The whole process took ~1.7s.
Keep in mind that, based on the usage I've seen so far, this is not the "usual" request because its response is so large: I got 448kib
for 1202 departures over 24h at Berlin Friedrichstr.
Some more common use cases:
6kib
of JSON.26kib
of JSON.Nevertheless, I am of course in favour of making hafas-client
faster!
If I interpret that correctly, it looks like ~60% of the time was spent with
findInTree
calls inparseCommonData
.
Keep in mind that, in order to get accurate and reliable results, you need to
hafas-client
often enough for the optimizing compiler to detect and optimise hot code paths.NODE_ENV=production
, as some libraries behave differently depending on this flag.In parse-benchmark
, I set up a small benchmark.
Running it using 0x
yields a very similar flame chart to what you reported: index.html.zip
Now, I have no idea what these do and if they are easy to make faster (or used less). Probably not...
Probably @simlu knows.
Oh yeah. I breathe this stuff 😉
So there are a few things to consider:
find
function is very small and executed a lot so bench marking might be slightly exaggerated due to overhead of profiler**
it needs to traverse the entire response tree. Much better would be to narrow down what should be targeted and where it could be present in the response tree. That would reduce what is traversedLet me know if you have any questions about above.
I can probably do a pr with these changes. Should not be very hard. However I'm busy this weekend, so it might take a moment. Feel free to ping me if you want to give it a shot
Also I'm expecting the changes proposed above to have a drastic impact on performance for large responses
Instead of calling findInTree multiple times, it should be called once and the callback function should detect which needle target was hit. This would bring the amount of traversal to a single one (instead of scanning the response 10x or more times)
This is probably the most significant optimisation we can implement here. But the callback in the 2nd find
call may depend on the "output" of the 1st.
The response is very large and since the code uses
**
it needs to traverse the entire response tree. Much better would be to narrow down what should be targeted and where it could be present in the response tree.
I don't want to maintain an explicit list of paths where this "reference resolving" (discussed in #127 and #146) needs to happen. I'd rather like to visit every remX
field in the tree, every himX
, etc.
Instead of calling findInTree multiple times, it should be called once and the callback function should detect which needle target was hit. This would bring the amount of traversal to a single one (instead of scanning the response 10x or more times)
This is probably the most significant optimisation we can implement here. But the callback in the 2nd
find
call may depend on the "output" of the 1st.
Could be a classic speed vs memory thing. We could do one scan with all possible matches and store them. Then use the stored results as needed. Might be faster than a whole other traversal but use more memory
The response is very large and since the code uses
**
it needs to traverse the entire response tree. Much better would be to narrow down what should be targeted and where it could be present in the response tree.I don't want to maintain an explicit list of paths where this "reference resolving" (discussed in #127 and #146) needs to happen. I'd rather like to visit every
remX
field in the tree, everyhimX
, etc.
Understandable. But do you not have any semantic knowledge whatsoever? Even baking a little in might help a lot
@derhuerst Can you please point out the dependency that you were talking about? I'm assuming it is in this file. Cheers!
Could be a classic speed vs memory thing. We could do one scan with all possible matches and store them. Then use the stored results as needed. Might be faster than a whole other traversal but use more memory
Yeah, there will only be 100s to ~1000 matches, so it should be feasible; It will just increase the amount of GC to be done.
Can you please point out the dependency that you were talking about?
Which dependency? Where did I talk about one?
Which dependency? Where did I talk about one?
The dependency that I had quoted! This one:
But the callback in the 2nd find call may depend on the "output" of the 1st.
I was asking for an example of such dependency. Do you have one handy?
I've implemented the performance improvements here: https://github.com/public-transport/hafas-client/pull/154
The speedup is significant and even noticeable with the tests.
Please take a very careful look. The problem is that the tree is updated in the callback. However currently the newly created branches are not scanned. That could be implemented though (at a small performance penalty). Do you think it is necessary?
Looking forward to your input!
I don't want to maintain an explicit list of paths where this "reference resolving" [...] needs to happen. I'd rather like to visit every
remX
field in the tree, everyhimX
, etc.Understandable. But do you not have any semantic knowledge whatsoever? Even baking a little in might help a lot
To explain more what I meant: Implicitly, there is a list of e.g. all relevant remX
places: All those that the parse functions in parse/*
handle.
But for two reasons, I rather want to resolve all:
parse/common.js
and parse/*
low. Unfortunately, a lot of parsing in hafas-client
is already too coupled, across several levels of indirection. 😕remX
that the built-in parse functions in parse/*
don't handle yet. This is actually not such an esoteric use case as might sound. 😛@derhuerst I somewhat understand. That's a bit more complicated than what I was hoping for. Let's see where the pending performance changes bring us and if more optimization is needed.
Just had another though: I maintain another package which might be handy for this project: object-rewrite.
Basically it would allow you to abstract a lot of rewrite logic in plugins, which makes it very easy to reason about changes in isolation and keep dependencies minimal. We use this to modify huge hierarchical data structures in memory (megabytes) where processing time and correctness matter. Feel free to take a look!
Just had another though: I maintain another package which might be handy for this project: object-rewrite.
I see the point of it for even more generalised use cases, e.g. user-defined rewriting rules. With hafas-client
, I'm not sure if it will actually make the code more maintainable.
The changes from #154 have been release as hafas-client@5.1.1
! :shipit:
@fnorf Would be awesome if you could give feedback on these performance improvements!
The performance hasn't improved much actually. Without:
parse 1200 departures x 1.78 ops/sec ±2.81% (9 runs sampled) (0.56s/run)
parse 45 departures x 34.69 ops/sec ±1.64% (61 runs sampled) (0.03s/run)
With #154:
parse 1200 departures x 2.17 ops/sec ±5.08% (10 runs sampled) (0.46s/run)
parse 45 departures x 42.08 ops/sec ±2.88% (56 runs sampled) (0.02s/run)
Still, thanks @simlu for the optimisation!
@derhuerst That would indicate that the bottleneck is elsewhere. I'd love to take a look myself. Am I using the branch you mentioned above? Which test am I running?
Run npm run benchmark
or npm run benchmark:profile
on the parse-benchmark
branch.
Very interesting. I did a big of digging but will have to continue later. What I found curious is that the first execution of the test is double (!) the speed of the other ones. So it looks like the optimization that happens actually hurts the execution speed...?! Maybe time to add more benchmark tests for object-scan...
There is a good chance the test is faulty. Haven't actually looked at it. Did you do a deep clone of the input for each test run? Otherwise the object changes between first and second execution
Edit see here https://github.com/public-transport/hafas-client/commit/c9412025bc707a7be4da93fc18711d04129b7c84
Also had another thought for optimization: Are there areas of the tree that we could explicitly exclude, ie areas where we know there couldn't be any matches?
Yes, the object is being mutated. My fault, sorry. https://github.com/public-transport/hafas-client/blob/c9412025bc707a7be4da93fc18711d04129b7c84/benchmark/index.js#L18
I've fixed the test setup in https://github.com/public-transport/hafas-client/pull/155/files
For the following test setup:
parseCommonData
functionparse 1200 departures
testI now get the following results:
Before optimization changes: ~252ms After optimization changes: ~92ms
That is a decrease of 160ms or 63.49%. I'd say that's pretty good :tada:
Also had another thought for optimization: Are there areas of the tree that we could explicitly exclude, ie areas where we know there couldn't be any matches?
What I mean is, are there any objects in the hierarchy that you know wont yield any matches?
E.g. if we knew that common.layerL
was never traversed, we could do something like
const needles = [
"**.oprX",
"**.icoX",
"**.prodX",
"**.pRefL",
"**.locX",
"**.ani.fLocX",
"**.ani.tLocX",
"**.fLocX",
"**.tLocX",
"**.remX",
"**.himX",
"**.polyG.polyXL",
"!common.layerL.**"
];
The necessary changes to have that actually impact performance are currently pending here: https://github.com/blackflux/object-scan/pull/876/files
What I mean is, are there any objects in the hierarchy that you know wont yield any matches?
E.g. if we knew that common.layerL was never traversed, we could do something like
"!common.layerL.**"
.
Not many. This would be coupling the other way around, similar to what I explained above: I don't want parseCommon
to arbitrarily restrict other code in which references are resolved.
Results including the benchmark fixes from #155:
parse 1200 departures x 5.56 ops/sec ±5.43% (19 runs sampled) (0.18s/run)
parse 45 departures x 86.17 ops/sec ±1.24% (73 runs sampled) (0.01s/run)
parse 1200 departures x 2.36 ops/sec ±5.51% (11 runs sampled) (0.42s/run)
parse 45 departures x 43.73 ops/sec ±1.30% (57 runs sampled) (0.02s/run)
This includes JSON parsing. I'd dare to say this is good enough™ for now.
The new object scan improved callback performance significantly. Might be worth updating it (breaking, but very simple change)
Here is the PR https://github.com/public-transport/hafas-client/pull/173
First of all, thank you for this awesome tool! Second of all, I have no clue about JavaScript.
I built some awful thing based on this and was surprised by the CPU usage. I had expected the bottleneck to be the time it took the APIs to answer but actually the data munching of
hafas-client
into its nice format seems to take the most resources. So I looked into it:I made a single 24 hour departures request. ~370 trips were returned. The whole process took ~1.7s.
Looking at my network logs my system spent about 0.5s talking with the remote server (reiseauskunft.bahn.de) in total.
So I looked at the process in a profiler (0x). Here is a flamegraph:
If I interpret that correctly, it looks like ~60% of the time was spent with
findInTree
calls inparseCommonData
.I also zoomed in to the two other blocks on the left:
Some date/time parsing and formatting.
Now, I have no idea what these do and if they are easy to make faster (or used less). Probably not... I saw some commits referencing them in https://github.com/public-transport/hafas-client/commit/2a6b0dc507bd0940ee09b16e1f097499a5e78082 and https://github.com/public-transport/hafas-client/commit/8c6a8d858edf6f75e19294f4066c252e64d11184.
Anyways, since I had looked into it and was surprised by the result, I thought I'd share this with you. There is the tiny and remote chance that you were unaware of these issues and this is useful. And even if not, this was a nice chance to say thanks again for this great project. :)