krasi-georgiev commented 5 years ago

fixes: https://github.com/prometheus/tsdb/issues/518 Chunks that are still being appended to might return corrupted data when calling chunk.Bytes() that is why incomplete chunks should never be persisted. This is relevant mainly when snapshotting with the head included.

The current snapshot test is sufficient to ensure that incomplete chunks(being appended to) are ignored.

Had this idea while reviewing https://github.com/prometheus/tsdb/pull/639

gouthamve commented 5 years ago

But this is an incomplete snapshot then? We can basically not persist 2hrs of ingested data with this.

krasi-georgiev commented 5 years ago

why 2h? chunk have a maximum of 120 samples.

https://github.com/prometheus/tsdb/blob/ba7f8025a9101663519d34020c283ed8f5ca79ff/head.go#L1695

yes there will be some missing data, but guarantees that the data won't be corrupt.

brian-brazil commented 5 years ago

It's not safe to do this, you need a clean break in the data. If a user wanted this they can skip the head block.

krasi-georgiev commented 5 years ago

Why not safe? this is skipping only few of the last samples not the entire head.

I looked at the code quite a bit and otherwise chunk.Bytes() could always return corrupt data.

brian-brazil commented 5 years ago

Why not safe? this is skipping only few of the last samples not the entire head.

It's randomly skipping different amounts of data for different series.

krasi-georgiev commented 5 years ago

I guess we can run some additional check to ensure all chunks are cut at the same time. So i understand the problem with different timings Can you give an example of a query that will cause issues?

Do you have any other idea to solve this race?

bwplotka commented 5 years ago

Yes. Reencode those moving chunks: https://github.com/prometheus/tsdb/pull/639

I thought about ignoring, but it's indeed quite bad if it can skip up to 2h of data.

krasi-georgiev commented 5 years ago

Did you see my comment in your pr? I think it still has a race there.

Btw at the most common interval of 15sec and 120samples per chunk that is max of 30min of data loss i think

krasi-georgiev commented 5 years ago

Maybe we can combine both prs. Which is To Reincode all incomplete chunks(head chunks)

bwplotka commented 5 years ago

I am not saying my PR is perfect. Even CI is saying it's not but my main focus is on remote read TBH so joining forces would be nice. I can push my changes to some tsdb Branch for easier collaboration @Krasi (:

On Thu, Jun 27, 2019, 15:50 Krasi Georgiev notifications@github.com wrote:

Maybe we can combine both prs. Which is To Reincode all incomplete chunks(head chunks)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prometheus/tsdb/pull/641?email_source=notifications&email_token=ABVA3OZCWMBRJNAWTYTKKO3P4THVFA5CNFSM4H32UOK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYXLXYY#issuecomment-506379235, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVA3O3VPX5Q3FW2SWCBTV3P4THVFANCNFSM4H32UOKQ .

codesome commented 5 years ago

I had a look at this and #639 , and for some reason I find re-encoding the last chunk safely and using it seems to be a better option than ignoring the last chunk. Reasons would be (1) What @brian-brazil said (2) Snapshot as much data as possible.

639 still needs to address the race as described by @krasi-georgiev here.

krasi-georgiev commented 5 years ago

I have just merged both PRs

bwplotka commented 5 years ago

Ok, l might be biased but I think having MaxTime == max is bit better way of saying if it's full. This is because it's quite natural and does not requre change in the interface. What do you think about closing this to avoid confusion and trying this approach on https://github.com/prometheus/tsdb/pull/639 ? Or we can start TSDB branch a new PR for better collab.

krasi-georgiev commented 5 years ago

ready for another review

bwplotka commented 5 years ago

Another idea would be to pass somehow else that it is a head block (both open and closed ones) and treat those differently in terms of chunk maxTime being outside of desired meta.MaxTime

brian-brazil commented 5 years ago

Another idea would be to pass somehow else that it is a head block (both open and closed ones) and treat those differently in terms of chunk maxTime being outside of desired meta.MaxTime

That's probably easiest at this stage.

krasi-georgiev commented 5 years ago

Another idea would be to pass somehow else that it is a head block (both open and closed ones) and treat those differently in terms of chunk maxTime being outside of desired meta.MaxTime

could you explain a bit more about this idea?

bwplotka commented 5 years ago

No worries. Essentially we are now marking open chunk in certain way and we rewrite only them. This part makes sense. But to fix edge case we need something more. I propose to let's say:

leave this marker we have now for open blocks. Rewrite those 100%.
Add some another way to tell if the chunk is part of head block. Another meta field just for purpose of populate is rather no go (will be hard as it is part of populateBlock and IndexReader interface..), but maybe literally some flag to compute.write ?

Now we will have:

check if head, open block. If yes rewrite.
else check if it is outside of meta.Max.
- if the chunk is part of head and is closed: rewrite
- anything else: fail.

krasi-georgiev commented 5 years ago

hm still a bit unclear, but IIUC you mean check if a chunk belongs to the head and rewrite when it is open or outside meta.Max ? for all other chunks just fail when outside meta.Maxt

bwplotka commented 5 years ago

Exactly (:

bwplotka commented 5 years ago

Today is release day and this PR is ready with only one nit: safeChunk casting to tell if it's head chunk chunk or not.

I would add TODO and otherwise merge it and do TSDB update on Prometheus side. Any attempt of refactoring populate will probably take days to do and review, so I would fix this nit later.

Thoughts? @brancz @brian-brazil @codesome @krasi-georgiev

codesome commented 5 years ago

I ran a quick benchmark which I had ready, and here are the results. And this doesn't have head in compaction.

benchmark                                                                                 old ns/op       new ns/op       delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     12470779450     12815518004     +2.76%

benchmark                                                                                 old allocs     new allocs     delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     17818580       24843658       +39.43%

benchmark                                                                                 old bytes      new bytes      delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     2112751344     2578539664     +22.05%

codesome commented 5 years ago

Increase in allocs (and also B/op) are alarming for a small change in populateBlock

codesome commented 5 years ago

And confirmed that it is in populateBlock, I am investigating it.

bwplotka commented 5 years ago

Great check @codesome thanks, let's remove this warning then @krasi-georgiev

brian-brazil commented 5 years ago

:+1:

prometheus-junkyard / tsdb

Re-encode chunks that are still being appended to when snapshoting. #641

639 still needs to address the race as described by @krasi-georgiev here.