Open dzvancuks opened 3 years ago
Few other findings:
Here is profiling for buffered gRPC call. CPU is still underutilized and GC is triggered very often.
Thank you for your descriptive write-up of the issue. You are on point with inefficient processing of transactions by the scheduler. Personally, I am migrating to a new project that plans to use Ligato more extensively, so I might actually do some optimizations in this area (I planned to optimize allocation of strings in the kvscheduler). However, if you think you could open some PRs at least for the low hanging fruit that could improve performance here, you are welcome to do so.
I've started code analysis to increase performance in separate fork: https://github.com/1NCE-GmbH/vpp-agent. I'll explain some pre-investigation and would like to discuss VPP-agent implementation details on how to better optimize its work flow.
First of all here are traces and profiler results: vpp-agent-profile.zip You can open them by executing following commands:
go tool trace trace.prof
go tool pprof -http=localhost:8080 cpu.prof
go tool pprof -http=localhost:8080 mutex.prof
During analysis it became clear that KV Scheduler is single point of throttling. Launching 30000 operations via gRPC channel this one entity blocks all other threads while performing consumeTransactions() -> processTransaction() calls.
Multi-threaded execution effectively becomes single-threaded operation of VPP-agent
All other gRPC calls are waiting on one single lock. But removing this particular mutex will not solve issue as these routines would park on channel
So main target for optimization is KV Scheduler. I'll try to refactor this module to utilize as much CPU time as possible. But I'd like to consult with project owners and mantainers to make this process more robust.
My questions to maintainers of VPP-agent:
txnLock
? Would internal Graph mutex be enough to sync data when I call graph.Write()/Read()?defaultPrintTxnSummary = false
? Printing lock also affects performanceCC: @ondrej-fabry, @milanlenco
After solving KV Scheduler issue I'll create followup issues regarding GC and memory usage, mutexes, and txnQueue size. There is a possibility that some of them will be solved together with Scheduler problems. So let's postpone new issue creation.
Started experimenting with multithread approach.
txnLock
completely. This is possible by introducing RWlock in Registry and by adding Mutex for valStateWatchersprocessTransaction()
. Graph's RWlock organizes multiple routine functionaliny. However ...5. Execution
the graphW := s.graph.Write(true, record)
locks other threads, which gives no benefit in multithreading processTransaction()
. To overcome this lock the Graph must be refactored. It can be done by moving mutex locking inside API methods. It would also require to decouple Node from Graph. Currently Nodes directly access Graph during Targets re-configuration. The Target struct might contain direct pointer to other nodes, or Graph itself should be responsible for Targets. Investigation is ongoing.
Any comment from maintainers would help CC: @ondrej-fabry, @milanlenco
Just want to note that Milan is no longer working with me as he started working for other company and most likely will not reply here. However before he left he was in the progress of preparing refactor of KVScheduler. You could look into it to get some idea of the planned changes. There are currently no plans to work on this at the moment, but possibly with your help we could make these ready.
here is the most recent one which was supposed to introduce new descriptor api v2 that replaces key parameter with more flexible context: https://github.com/ligato/vpp-agent/compare/master...milanlenco:kvDescriptorV2
and here is little bit older planned change which is more related to your discoveries and was supposed to actually add support for parallel processing of transactions: https://github.com/ligato/vpp-agent/compare/master...milanlenco:exec_engine
@ondrej-fabry , thank you for response. I studied provided branches.
This one optimizes GC. I'm thinking of splitting current issue into 2 steps. One is for parallel tnx processing, other for GC and memory management. In scope of this issue I'll focus on Scheduler tnx handling and after that I'll create a followup for memory.
This one does relate to performance issue. However it doesn't solve root cause of exclusive Graph write locking. I'll continue investigation for Graph handling refactoring.
To refactor Graph the following steps must be performed:
Small update. Our team has decided to implement own VPP transaction handler using GoVPP. In our project we require to handle many small transactions, instead of single complex tnx with many dependencies. We had no need in recursive graph features that VPP-agent provides, but spending time on it's refactoring would require much more effort.
@ondrej-fabry thanks for feedback on my attempts to improve Scheduler. I hope next revisions of VPP-agend would become more performant. If you would like to continue this work then I encourage you to start with refactoring the Graph module.
After some profiling it seems that these 2 functions impacts performance when pushing many operations:
Problem is that VPP agent process operations in single thread with mutex/queue blocks. Instead it should utilize cheap Go routines and sync.Pool memory.
To collect trace I used pprof tool in main() function and pushed 30000 Update commands via gRPC connection.
Trace will show underutilized CPU and long wait time on mentioned functions