thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.05k stars 2.09k forks source link

Add query logging/tracing #1706

Open bwplotka opened 4 years ago

bwplotka commented 4 years ago

We should log queries and preserve those optionally in tracing. This is to clearly figure out which query caused unexpected behavior or hit limit!

Nice to have: middleware logging: https://github.com/grpc-ecosystem/go-grpc-middleware/pull/223 cc @adrien-f (: Although we can host something temporarily.

Related Prometheus issue: https://github.com/prometheus/prometheus/issues/6223

AC :

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

bwplotka commented 4 years ago

We need that!

On Thu, 27 Feb 2020 at 11:14, stale[bot] notifications@github.com wrote:

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/thanos-io/thanos/issues/1706?email_source=notifications&email_token=ABVA3O55SFMX6J4FJ6D4FSDRE6OCTA5CNFSM4JH2VRSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEND65SY#issuecomment-591916747, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVA3O7MH2CTC5GVIGGAUYDRE6OCTANCNFSM4JH2VRSA .

stale[bot] commented 4 years ago

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

bwplotka commented 4 years ago

Still needed, no progress on this yet. Help wanted!

@mkabischev did something promising!

https://github.com/thanos-io/thanos/issues/2321#issuecomment-603765739

bwplotka commented 4 years ago

Tracing is done, but we might want to log more on debug.

bwplotka commented 4 years ago

This was merged: https://github.com/grpc-ecosystem/go-grpc-middleware/pull/223 so we might want to think about upgrading middleware and getting this in.

However the current version is not fully correct on middleware sides, the v2 works just started, we could do that first: https://github.com/grpc-ecosystem/go-grpc-middleware/issues/275

vaibhavgulati commented 4 years ago

valgrind-out.txt Ran valgrind over thano0.12.0 it gave few errors I don't know if its helpful for you to find out memory leaks.

obiesmans commented 4 years ago

Small feedback as a tracing user :

It's working great and it's a really useful tool to troubleshoot our Thanos architecture. Filtering sensitive data could be a neat addition (our GCS service-account key is in ENV). Also, having queryiers/stores/sidecar logging slow queries could be useful to those not willing to invest in a tracing infrastructure yet.

yashrsharma44 commented 4 years ago

Is this issue https://github.com/grpc-ecosystem/go-grpc-middleware/issues/275 a blocker for solving #1706(Query Logging)?

bwplotka commented 4 years ago

Well not blocker, just some dependency we need to work on. I am maintainer on grpc middleware, as well, so yea we might want to push something forward.

TBH the v2 is mainly done so we can just try upgrade Thanos to that (: (just some minor import tweaks I need to fix first, probably tomorrow)

VineethReddy02 commented 4 years ago

@bwplotka I don't see any queries preserved when I configured jaeger in tags, processes & logs of a particular span. Do we need to set any configurations in --tracing-config to preserve queries in traces? or we don't have such implementation in tracing yet? And the expectation of this issue is to implement that?

bwplotka commented 4 years ago

Queries are in Jeager for sure. We use that fact every day. (:

Will add clear acceptance criteria to this issue, good point (:

On Fri, 1 May 2020 at 10:16, Vineeth Pothulapati notifications@github.com wrote:

@bwplotka https://github.com/bwplotka I don't see any queries preserved when I configured jaeger in tags, processes & logs of a particular span. Do we need to set any configurations in --tracing-config to preserve queries in traces? or we don't have such implementation in tracing yet? And the expectation of this issue is to implement that?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thanos-io/thanos/issues/1706#issuecomment-622314141, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVA3OZHYE4EGYRIPN7JGLTRPKHN3ANCNFSM4JH2VRSA .

VineethReddy02 commented 4 years ago
matchers "{name="go_memstats_frees_total"}"  
minTime 1588280584361  
maxTime 1588324084361

Ahhh... I missed this span. I can see the query select info here. Thanks for the quick info!

stale[bot] commented 4 years ago

Hello 👋 Looks like there was no activity on this issue for last 30 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] commented 4 years ago

Closing for now as promised, let us know if you need this to be reopened! 🤗

bwplotka commented 4 years ago

@yashrsharma44 is on it

stale[bot] commented 4 years ago

Hello 👋 Looks like there was no activity on this issue for last 30 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

GiedriusS commented 4 years ago

@yashrsharma44 is still on it: https://github.com/thanos-io/thanos/pull/2849

stale[bot] commented 4 years ago

Hello 👋 Looks like there was no activity on this issue for last 30 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

kakkoyun commented 4 years ago

@yashrsharma44 is still on it for grpc part https://github.com/grpc-ecosystem/go-grpc-middleware/pull/311

Also @yashrsharma44 have we addressed the tracing part of this issue? Or do we have a plan?

yashrsharma44 commented 4 years ago

I think the tracing part of the issue referred to the tracing infrastructure that we have it now in Thanos. More info - https://github.com/thanos-io/thanos/issues/1706#issuecomment-617606301

stale[bot] commented 4 years ago

Hello 👋 Looks like there was no activity on this issue for last 30 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

ipstatic commented 4 years ago

This is still an issue.

stale[bot] commented 3 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] commented 3 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

ipstatic commented 3 years ago

Still an issue

stale[bot] commented 3 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] commented 3 years ago

Closing for now as promised, let us know if you need this to be reopened! 🤗

aymericDD commented 3 years ago

This is still an issue.

stale[bot] commented 3 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

aymericDD commented 3 years ago

Still an issue

stale[bot] commented 3 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] commented 3 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

bwplotka commented 2 years ago

Let's get back to this since @aymericDD want to help in this.

Let's figure out what exactly is missing. Sounds to me like tracing is done. But logging might need more metadata/better usability.

Feel free @aymericDD to give more details on what's your current problem is.

bwplotka commented 2 years ago

Let's make sure we reuse exactly the same statistics and metadata as for tracing.

bwplotka commented 2 years ago

FYI https://thanos.io/tip/proposals-done/202005-query-logging.md/

bwplotka commented 2 years ago

Some starting point: https://github.com/thanos-io/thanos/blob/24602c428e8d55d3aa4cbccd12a1e84ec18c22af/pkg/server/grpc/grpc.go#L85

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] commented 2 years ago

Closing for now as promised, let us know if you need this to be reopened! 🤗

aymericDD commented 2 years ago

Still an issue

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

GiedriusS commented 2 years ago

https://github.com/thanos-io/thanos/issues/5521 done