purpleidea / mgmt

Next generation distributed, event-driven, parallel config management!
https://purpleidea.com/tags/mgmtconfig/
GNU General Public License v3.0
3.67k stars 315 forks source link

Improve performance of autoedges #739

Open purpleidea opened 9 months ago

purpleidea commented 9 months ago

Description

Autoedges is not as fast as it could be. It's not blocking us anywhere, but we should think about eventually optimizing this to lower graph sync latency.

Analysis

For each new resource graph (in the stream of resource graphs) that is generated, we must compute the automatic edges. I ran this on my provisioning code base. (It will be published shortly.) Currently this is a bit slow. I added some light profiling to the graph generation steps, and I got the following:

main: new graph took: 900.937µs
main: auto edges took: 7.206085s
main: auto grouping took: 25.508671ms
main: send/recv building took: 489ns

main: new graph took: 794.224µs
main: auto edges took: 4.313729756s
main: auto grouping took: 47.970533ms
main: send/recv building took: 207.62µs

Note the units. What surprised me was that I expected autogrouping to be the slower operation, but in fact it is not. This is obviously highly dependent on the specific mcl code being run, but it gives us a good rough idea.

Improvements

There are both algorithmic improvements and plumbing improvements that we could do.

Algorithmic

@frebib has some initial POC work on making the algorithm n*log(n) IIRC. I think it's currently n^2.

Plumbing

There are likely some efficiency wins if we look at the code. This needs proper profiling (even flamegraphs?) to know where we're actually slow.

We could also consider adding a static API at resource creation, that defines which resource kind pairs are even legally able to create auto edges. For example, if we know that svc can only ever autoedge with file and pkg, this reduces our search space drastically. Keep in mind that either resource side needs to be able to opt-in.

Concurrency

Graph generation happens in parallel to execution, so this isn't blocking execution. We could double-check that it's nominally concurrent, and we could make sure we pause only after the graph is built, but this is very low-priority. Secondly, I wouldn't be surprised if we could actually implement a concurrent version of the autoedges algorithm that uses more CPU's, but I don't think this is an important optimization to do at this time.

Generics

@frebib believes the algorithmic implementation would be a lot faster with generics. If we can avoid this, that should be a goal, but it's an important thing to discuss.

purpleidea commented 9 months ago

@frebib If you could push (even a non-compiling) POC of your autoedges branch, that would be awesome!

purpleidea commented 9 months ago

A bit more data to be sure

main: new graph took: 913.653µs
main: auto edges took: 9.273807153s
main: auto grouping took: 28.690819ms
main: send/recv building took: 566ns

main: new graph took: 779.255µs
main: auto edges took: 4.03670168s
main: auto grouping took: 37.682101ms
main: send/recv building took: 121.017µs

main: new graph took: 1.157479ms
main: auto edges took: 3.794132165s
main: auto grouping took: 49.732836ms
main: send/recv building took: 95.921µs

main: new graph took: 900.937µs
main: auto edges took: 7.206085s
main: auto grouping took: 25.508671ms
main: send/recv building took: 489ns

main: new graph took: 794.224µs
main: auto edges took: 4.313729756s
main: auto grouping took: 47.970533ms
main: send/recv building took: 207.62µs

main: new graph took: 884.49µs
main: auto edges took: 7.585529786s
main: auto grouping took: 24.327938ms
main: send/recv building took: 72.741µs

main: new graph took: 774.157µs
main: auto edges took: 2.827380129s
main: auto grouping took: 28.303023ms
main: send/recv building took: 85.246µs

main: new graph took: 746.841µs
main: auto edges took: 2.775868117s
main: auto grouping took: 33.11291ms
main: send/recv building took: 104.875µs

main: new graph took: 796.445µs
main: auto edges took: 2.71556122s
main: auto grouping took: 24.03827ms
main: send/recv building took: 106.414µs

main: new graph took: 1.217452ms
main: auto edges took: 2.908416104s
main: auto grouping took: 61.175916ms
main: send/recv building took: 92.328µs

main: new graph took: 807.894µs
main: auto edges took: 3.222089261s
main: auto grouping took: 40.032629ms
main: send/recv building took: 106.49µs

main: new graph took: 986.963µs
main: auto edges took: 3.538425263s
main: auto grouping took: 30.660849ms
main: send/recv building took: 99.74µs
frebib commented 9 months ago

I haven't worked on this since the cfgmgmtcamp hacking day and I only just made it compile. No idea if it works, but have at it: https://github.com/frebib/mgmt/commits/frebib/autoedge/