hypertrace / hypertrace-service

Multiple hypertrace services combined together to form a single service.
Other
4 stars 15 forks source link

fix pinot health check configuration #51

Closed flyinprogrammer closed 3 years ago

flyinprogrammer commented 3 years ago

Description

Pinot 0.6.0 controllers now have a health check: https://github.com/apache/incubator-pinot/pull/5846 So we should just use that instead of confusing people with asking about a single pinot server health check, and making it impossible to reference a controller running on a different host.

Checklist:

Documentation

As far as I can tell this configuration isn't referenced anywhere, but perhaps I am blind.

For reference a Pinot Server: https://docs.pinot.apache.org/developers/advanced/advanced-pinot-setup#pinot-server And Pinot Controller: https://docs.pinot.apache.org/developers/advanced/advanced-pinot-setup#pinot-controller Are two different components of this distributed system. This service only relies on the Controller /brokers/tenants API and yet it was using the Server's /health API in the absence of a /health endpoint on the controller itself.

buchi-busireddy commented 3 years ago

@adriancole FYI, you might want to take a look at this since you have some context.

jcchavezs commented 3 years ago

Let me have a look at the tests not running.

On Wed, 9 Dec 2020, 19:52 Buchi Reddy Busi Reddy, notifications@github.com wrote:

@adriancole https://github.com/adriancole FYI, you might want to take a look at this since you have some context.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hypertrace/hypertrace-service/pull/51#issuecomment-741976985, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXOYAWPYZJSLPWJKI4X2CLST7BQVANCNFSM4US5KV2A .

JBAhire commented 3 years ago

Let me have a look at the tests not running. On Wed, 9 Dec 2020, 19:52 Buchi Reddy Busi Reddy, @.***> wrote: @adriancole https://github.com/adriancole FYI, you might want to take a look at this since you have some context. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#51 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXOYAWPYZJSLPWJKI4X2CLST7BQVANCNFSM4US5KV2A .

CircleCI tests were failing due to failed authentication (unauthorized user PR) and now build and snyk-scan is failing due to https://github.com/hypertrace/hypertrace/issues/132 .

jcchavezs commented 3 years ago

Leta merge this and add a check to avoid a regression.

On Thu, 10 Dec 2020, 13:07 Jayesh Bapu Ahire, notifications@github.com wrote:

@JBAhire approved this pull request.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hypertrace/hypertrace-service/pull/51#pullrequestreview-549131622, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXOYAR7FY5Q6KJBNZ3QQP3SUC2Z5ANCNFSM4US5KV2A .

kotharironak commented 3 years ago

There seems to be confusion here due to

  1. alias network name used in docker-compose file for pinot's all-in-one deployment called the service manager
  2. usage of Hypetrarace Service
    • It is only used in local deployment mode for a quick start, not in SaaS

For our docker-compose setup,

a. we are using pinot's servicemanager role (which runs all pinot's component in a single jvm). In this role as of now, it starts controller first than server and broker in parallel. Refer PR - https://github.com/apache/incubator-pinot/pull/5917/files - which is part of pinot 0.6.0 release

b. HypetraceService is dependent on pinot as ref https://github.com/hypertrace/hypertrace/blob/main/docker/docker-compose.yml#L27. However, instead of using the service_healthy construct, we are using service_started as pinot startup is heavy. On low configuration local envs, it may take more than 60secs which is a limitation (undocumented) for docker-compose for dependent containers. So, if pinot takes more than 60 secs, it will not start HypetraceService. So, we start them parallel, and checks for pinot health internally in HypertraceService.

c. controller health check doesn't tell that server and broker component has started successfully. So, we had requested for a composite health check for all components running as part of servicemanager here - https://github.com/apache/incubator-pinot/issues/5850, https://github.com/apache/incubator-pinot/issues/5950

d. In servicemanager role, as the controller starts first and server, broker components in parallel, we had noticed a few seconds delay in low config evns before they are successfully available. So, we have added checks for pinot-server and pinot-broker registration before checking for availability of trace/span in pinot` - https://github.com/hypertrace/hypertrace-service/blob/main/hypertrace-service/src/main/java/org/hypertrace/service/HypertraceUIServerTimerTask.java#L147

So, I would suggest adding aliases for all the three components - pinot-server, pinot-broker, and pinot-controller in the docker-compose file to avoid naming confusion, and reverting the checks to pinot-server for now till we have a single endpoint to check the health of all components (we may have to parse though).

Let me know if I am missing something.

cc: @flyinprogrammer @jcchavezs @laxmanchekka @buchi-busireddy

jcchavezs commented 3 years ago

Good catch @kotharironak! I think this is why it is important to have a check on this. Would you revert it and fix the naming discrepancy?

On Mon, 14 Dec 2020, 06:38 kotharironak, notifications@github.com wrote:

There seems to be confusion here due to

  1. alias network name used in docker-compose file for pinot's all-in-one deployment called the service manager

    https://github.com/hypertrace/hypertrace/blob/main/docker/docker-compose.yml#L102

  2. usage of Hypetrarace Service
    • It is only used in local deployment mode for a quick start, not in SaaS

For our docker-compose setup,

a. we are using pinot's servicemanager role (which runs all pinot's component in a single jvm). In this role as of now, it starts controller first than server and broker in parallel. Refer PR - https://github.com/apache/incubator-pinot/pull/5917/files - which is part of pinot 0.6.0 release

b. HypetraceService is dependent on pinot as ref https://github.com/hypertrace/hypertrace/blob/main/docker/docker-compose.yml#L27. However, instead of using the service_healthy construct, we are using service_started as pinot startup is heavy. On low configuration local envs, it may take more than 60secs which is a limitation (undocumented) for docker-compose for dependent containers. So, if pinot takes more than 60 secs, it will not start HypetraceService. So, we start them parallel, and checks for pinot health internally in HypertraceService.

c. controller health check doesn't tell that server and broker component has started successfully. So, we had requested for a composite health check for all components running as part of servicemanager here - apache/incubator-pinot#5850 https://github.com/apache/incubator-pinot/issues/5850, apache/incubator-pinot#5950 https://github.com/apache/incubator-pinot/issues/5950

d. In servicemanager role, as the controller starts first and server, broker components in parallel, we had noticed a few seconds delay in low config evns before they are successfully available. So, we have added checks for pinot-server and pinot-broker registration before checking for availability of trace/span in pinot` - https://github.com/hypertrace/hypertrace-service/blob/main/hypertrace-service/src/main/java/org/hypertrace/service/HypertraceUIServerTimerTask.java#L147

So, I would suggest adding aliases for all the three components - pinot-server, pinot-broker, and pinot-controller in the docker-compose file to avoid naming confusion, and reverting the checks to pinot-server for now till we have a single endpoint to check the health of all components (we may have to parse though).

Let me know if I am missing something.

cc: @flyinprogrammer https://github.com/flyinprogrammer @jcchavezs https://github.com/jcchavezs @laxmanchekka https://github.com/laxmanchekka @buchi-busireddy https://github.com/buchi-busireddy

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hypertrace/hypertrace-service/pull/51#issuecomment-744181330, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXOYARJUDTQUWZ3GPKVVODSUWQEHANCNFSM4US5KV2A .