jjneely / buckytools

Go implementation of useful tools for dealing with Graphite's Whisper DBs and Carbon hashing
Other
87 stars 21 forks source link

[bug] bucky-fill shouldn't cross-fill archives #19

Open Civil opened 7 years ago

Civil commented 7 years ago

The easiest way to reproduce the bug for me was:

  1. Get some real wsp file with AggregationMethod=sum and multiple archives (original.wsp)
  2. Make a copy of it with a new name (copy.wsp)
  3. bucky-fill original.wsp copy.wsp

copy.wsp becomes corrupted - some points from Archive1 of original.wsp got written to Archive0 of copy.wsp

It seems that this happens with gaps in archive0 and archive1, but I'm not sure. I have an example real world data, but still trying to create a test-case out of that.

Civil commented 7 years ago

I've added a test case for this bug:

vsmirnov@ml905507 ~/go/gopath_third_party/src/github.com/jjneely/buckytools/fill $ go test -run=^TestTwo
2017/08/14 17:10:49 Running whisper-fill.py...
C       D       Python  Go      Simu
======  ======  ======  ======  ======
   6.0     6.0     3.0     3.0     6.0
   0.0     0.0     NaN     NaN     0.0
   9.0     9.0     6.0     6.0     9.0
   4.0     4.0     0.0     0.0     4.0
   3.0     3.0     9.0     9.0     3.0
   1.0     1.0     4.0    14.0     1.0
   2.0     2.0     3.0     3.0     2.0
   4.0     4.0     1.0     1.0     4.0
   4.0     4.0     2.0     2.0     4.0
   0.0     0.0     4.0     4.0     0.0
   4.0     4.0     4.0    21.0     4.0
   5.0     5.0     0.0     0.0     5.0
   8.0     8.0     4.0     4.0     8.0
   0.0     0.0     5.0     5.0     0.0
   1.0     1.0     8.0     8.0     1.0
   6.0     6.0     0.0    17.0     6.0
   5.0     5.0     1.0     1.0     5.0
   5.0     5.0     6.0     6.0     5.0
   0.0     0.0     5.0     5.0     0.0
   6.0     6.0     5.0     5.0     6.0
   8.0     8.0     0.0    27.0     8.0
   7.0     7.0     6.0     6.0     7.0
   6.0     6.0     8.0     8.0     6.0
   6.0     6.0     7.0     7.0     6.0
   7.0     7.0     6.0     6.0     7.0
   4.0     4.0     6.0    17.0     4.0
   0.0     0.0     7.0     7.0     0.0
   0.0     0.0     4.0     4.0     0.0
   2.0     2.0     0.0     0.0     2.0
   4.0     4.0     0.0     0.0     4.0

--- FAIL: TestTwoArchives (0.19s)
        fill_test.go:342: Whipser point 0 is NaN but should be 3.000000

        fill_test.go:349: Whipser point 0 is NaN but should be 3.000000

FAIL
exit status 1
FAIL    github.com/jjneely/buckytools/fill      0.199s

As you can see here in Go code there are points that are actually sum of the points between them, and that shouldn't happen

I'm also not sure why fetchFromFile and C is that much different.

jjneely commented 7 years ago

Thanks for this.

Civil commented 7 years ago

I'm still not sure how to write test that it would show normal data in first two columns, but I'll try to do that tomorrow (if you won't do that instead of me).

Though I also have an idea of how to fix that: extend whisper library with ability to update archive with certain retention period (e.x. generalize current function to accept 'retention int' as a parameter and if it's not 0, skip archives that doesn't match desired retention).

Civil commented 7 years ago

Ok, I've added a proper test for this particular issue, though I've hardcoded dataset for it.

Though fill still missing tests that will check archives other than 1st one.

Civil commented 7 years ago

I've also pushed my idea of fix for that issue to original PR.

Civil commented 7 years ago

@jjneely any news on this bug? My patch works really well in my tests, so can you please have a look at it and either merge it or suggest another solution for this problem?

jjneely commented 7 years ago

Looks sane, but I want to spend some time to review.

azhiltsov commented 7 years ago

@jjneely did you have any spare cycles for review by any chance?

jjneely commented 7 years ago

I have a upcoming project at work that will afford me some time here, however, more urgent matters keep emerging. Its on the list, I'm just low on manpower.

jjneely commented 6 years ago

Apologies, I want to take the corruption issues seriously, and this is a work sponsored project -- which you would think would provide ample time to work on it. But the goal posts keep moving.

Civil commented 6 years ago

Hi!

Please have a look at the PR to solve this problem. Pros to merge it as-is even without an in-depth look:

  1. There is a test case that fails with current code
  2. After the fix it's not failing anymore
  3. No other tests fails

Assuming that there are tests for most of stuff already in place, it should be safe to merge it as-is now and investigate problem in-depth later.

Civil commented 6 years ago

ping

Civil commented 6 years ago

@jjneely can you please have a look at this issue and at my PR?

I understand you might be short of time, but this issue cause corruption during backfill. I think this is very important to fix this, otherwise it makes backfill unusable (you will get garbage in backfilled files).

towolf commented 6 years ago

@jjneely seeing that you seem to have no more time for this project and @Civil seeming to be a competent submitter, can you please just merge this?

Civil commented 6 years ago

For now we decided to maintain our own fork with patches that we want to have in buckytools (see https://github.com/go-graphite/buckytools ). And we'll think about just rewriting buckytools from scratch to implement better support of our stack and to solve some other issues we have with it (e.x. that it requires to have centralized server and push data through it).

gksinghjsr commented 6 years ago

Can this pull request be merged, as it would help move the project forward? Also, can @Civil also be given admin rights?