Requirement - what kind of business use case are you trying to solve?
We are implementing a custom gRPC-based storage plugin as per this doc.
Problem - what in Jaeger blocks you from solving the requirement?
Could you please clarify what is the concurrency model of a gRPC plugin in Jaeger. I'm getting following confusing results from my experiments:
When loading main page Jager will usually call GetServices and GetOperations methods on a plugin. I can see that these two calls are executed in parallel, so if each call takes ~3s to fetch data from the DB the total load time will be 3s.
When several clients load main page simultaneously (refreshing main page from many browser tabs at once) the calls to the plugin seems to be linearized, i.e. GetServices call made as part of serving data for the first client will block, and only after it returns the GetServices call for the second client will start executing.
Having observed the first behavior locally the second case came as a big surprise. Our pluging is concurrency safe, has no mutexes, and able to execute all operations in parallel. I have also verified this behavior with GetServices/GetOperations returning a pre-canned responses and adding time.Sleep().
Impact: We already implemented a sophisticated caching logic in our plugin for services and operations, but this head-of-line blocking behavior for requests that cannot be cached can result in a very poor user experience when Jaeger is exposed to multiple users.
Proposal - what do you suggest to solve the problem or improve the existing situation?
Requirement - what kind of business use case are you trying to solve?
We are implementing a custom gRPC-based storage plugin as per this doc.
Problem - what in Jaeger blocks you from solving the requirement?
Could you please clarify what is the concurrency model of a gRPC plugin in Jaeger. I'm getting following confusing results from my experiments:
When loading main page Jager will usually call
GetServices
andGetOperations
methods on a plugin. I can see that these two calls are executed in parallel, so if each call takes ~3s to fetch data from the DB the total load time will be 3s.When several clients load main page simultaneously (refreshing main page from many browser tabs at once) the calls to the plugin seems to be linearized, i.e.
GetServices
call made as part of serving data for the first client will block, and only after it returns theGetServices
call for the second client will start executing.Having observed the first behavior locally the second case came as a big surprise. Our pluging is concurrency safe, has no mutexes, and able to execute all operations in parallel. I have also verified this behavior with
GetServices/GetOperations
returning a pre-canned responses and addingtime.Sleep()
.Impact: We already implemented a sophisticated caching logic in our plugin for services and operations, but this head-of-line blocking behavior for requests that cannot be cached can result in a very poor user experience when Jaeger is exposed to multiple users.
Proposal - what do you suggest to solve the problem or improve the existing situation?
Any open questions to address