tdunning / t-digest

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means
Apache License 2.0
1.95k stars 227 forks source link

Index out of bounds in MergingDigest #107

Open death opened 6 years ago

death commented 6 years ago

Hi,

The following variant of the testSmallCountQuantile test results in a java.lang.ArrayIndexOutOfBoundsException error. Dropping the last element will have the test pass.

@Test
public void testSmallCountQuantile2() {
    List<Double> data = Lists.newArrayList(3.0, 3.1, 3.0, 3.0,
                                           3.0, 3.0, 3.0, 3.0,
                                           3.0, 3.0, 3.0, 3.0,
                                           3.5, 3.3, 3.4, 3.5,
                                           3.5, 4.0, 2.95, 3.18,
                                           180.0, 3.0, 3.0, 3.0
                                           , 3.0
                                           );
    TDigest td = new MergingDigest(1);
    for (Double datum : data) {
        td.add(datum);
    }
}
dariomx commented 5 years ago

this may be out of date; i just ran it into current master and it passes.

tdunning commented 5 years ago

Thanks for checking.

(and what a huge relief)

On Wed, Sep 5, 2018 at 4:00 PM dariomx notifications@github.com wrote:

this may be out of date; i just ran it into current master and it passes.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tdunning/t-digest/issues/107#issuecomment-418908639, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPSeivTS7IIUfOyFZoN4KrkAKIenbXmks5uYFeTgaJpZM4T0JWi .

rnorth commented 5 years ago

I'm not sure I'd say this is out of date, per se. The above code snippet runs OK with the current master version. However, the latest release version (AFAICT 3.2, released August 2017) will crash with these inputs, and we've seen similar failures in production systems with different inputs.

Is there a plan to release a new version of the library to maven central? It's a really nice library, so we're keen to continue using it and would prefer not to have to run our own internal fork!

tdunning commented 5 years ago

Yes.

I will be releasing a new version before long.

There are a number of improvements to be had, particularly in terms of accuracy and predictability.

Would it help you if I released a patch release before the larger release?

On Mon, Nov 26, 2018 at 3:37 PM Richard North notifications@github.com wrote:

I'm not sure I'd say this is out of date, per se. The above code snippet runs OK with the current master version. However, the latest release version (AFAICT 3.2, released August 2017) will crash with these inputs, and we've seen similar failures in production systems with different inputs.

Is there a plan to release a new version of the library to maven central? It's a really nice library, so we're keen to continue using it and would prefer not to have to run our own internal fork!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdunning/t-digest/issues/107#issuecomment-441682617, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPSekutHVlNN2nRukJ7XZW_yO36Jag7ks5uzArBgaJpZM4T0JWi .

tdunning commented 5 years ago

A compression factor of 1 is pretty extreme, isn't it? I wouldn't understand the use of any value less than, say, 10 or 20.

rnorth commented 5 years ago

Thanks for such a quick response. Yes, a patch release would be greatly appreciated.

We have this occurring at a compression factor of 100 (and obviously an entirely different volume of input data points).

tdunning commented 5 years ago

Where is the exception happening?

The problem that I just looked at had to do with the bufferSize getting set too small so that the array copy failed on merging. Where is your problem appearing? Same place?

rnorth commented 5 years ago

In both cases (the repro case above and our error in production) both failed at line 302 of MergingDigest.java, which is the following (in v3.2):

        System.arraycopy(mean, 0, incomingMean, incomingCount, lastUsedCell);

Relevant part of the stack trace:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
    at java.lang.System.arraycopy(Native Method)
    at com.tdunning.math.stats.MergingDigest.merge(MergingDigest.java:302)
    at com.tdunning.math.stats.MergingDigest.mergeNewValues(MergingDigest.java:291)
    at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:202)
    at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:194)
    at com.tdunning.math.stats.AbstractTDigest.add(AbstractTDigest.java:131)

I'm afraid right now I can't give much more information about the state of the fields at the time this is occurring, aside from saying that we've seen this on one of our busiest services. I shall try and gather some more useful data tomorrow, though, if it helps.

hangfei commented 5 years ago

Is this fixed?

tdunning commented 5 years ago

Should be fixed in master. Let me verify.

On Tue, May 14, 2019 at 1:55 PM Carotene notifications@github.com wrote:

Is this fixed?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdunning/t-digest/issues/107?email_source=notifications&email_token=AAB5E6VF3ZB7DNQLN2QF4B3PVMRL7A5CNFSM4E6QSWRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVMYLCI#issuecomment-492406153, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB5E6USGFRC2WK4UCYYWZDPVMRL7ANCNFSM4E6QSWRA .

tdunning commented 5 years ago

Verified.

hangfei commented 4 years ago

Is this fix released and in which version?

cnuernber commented 3 years ago

Related exception in 3.2:

ERROR in (issue-201-incorrect-result-column-count) (System.java:-2)
Uncaught exception, not in assertion.
expected: nil
  actual: java.lang.ArrayIndexOutOfBoundsException: null
 at java.lang.System.arraycopy (System.java:-2)
    com.tdunning.math.stats.MergingDigest.merge (MergingDigest.java:302)
    com.tdunning.math.stats.MergingDigest.mergeNewValues (MergingDigest.java:291)
    com.tdunning.math.stats.MergingDigest.quantile (MergingDigest.java:661)

This is after do a parallelized aggregation where TDigest.add was called to merge separate thread contexts. It happens only once in a while, not every time which is a bummer for a large aggregation.

tdunning commented 3 years ago

Chris,

It doesn't sound like you have a test case for this, but could you say which version you are using?

On Wed, Mar 24, 2021 at 6:45 AM Chris Nuernberger @.***> wrote:

Related NPE I believe:

ERROR in (issue-201-incorrect-result-column-count) (System.java:-2)Uncaught exception, not in assertion.expected: nil actual: java.lang.ArrayIndexOutOfBoundsException: null at java.lang.System.arraycopy (System.java:-2) com.tdunning.math.stats.MergingDigest.merge (MergingDigest.java:302) com.tdunning.math.stats.MergingDigest.mergeNewValues (MergingDigest.java:291) com.tdunning.math.stats.MergingDigest.quantile (MergingDigest.java:661)

This is after do a parallelized aggregation where TDigest.add was called to merge separate thread contexts. It happens only once in a while, not every time which is a bummer for a large aggregation.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdunning/t-digest/issues/107#issuecomment-805834394, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5E6V2TL2MNEHCXG43TGTTFHUHVANCNFSM4E6QSWRA .

cnuernber commented 3 years ago

I have a test case that does this about 3 times out of 100 if I run it in a loop. I am using version 3.2.

I will get test with master:HEAD and see if that eliminates/changes the problem.

tdunning commented 3 years ago

I would love to have the test case as well. Not useful as it stands as a unit case, but very, very useful to use to collect data.

One of the pressures for a new release is that several related issues have been fixed in the main branch. (note that the master branch got renamed not long ago)

On Wed, Mar 24, 2021 at 12:06 PM Chris Nuernberger @.***> wrote:

I have a test case that does this about 3 times out of 100 if I run it in a loop. I am using version 3.2.

I will get test with master:HEAD and see if that eliminates/changes the problem.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdunning/t-digest/issues/107#issuecomment-806081586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5E6XBFS2ZCNGAPADEGCLTFIZ25ANCNFSM4E6QSWRA .

cnuernber commented 3 years ago

100% main:HEAD fixes the test. I ran it many thousands of times just now and saw zero failures.

The test case involves a Clojure datatframe library which means you would be calling Clojure in your unit tests. Are you sure you want the test case :-)? I basically to a N-core reduction across Y datasets followed by thread0_digest.add(rest-of-digests).

This library is fantastic so I am down to help you any way you like.

tdunning commented 3 years ago

OK.

This is yet another vote for a release.

Gonna have to do it even without the serialization stuff that is cooking.

Speaking of which, please weigh in on your thoughts.

https://docs.google.com/document/d/1CX9Colw1g58nLHfJ_IR4XeFHhUaiXGTOG0aKkHpTut4/edit#heading=h.oukzhuq9hijs

On Wed, Mar 24, 2021 at 12:25 PM Chris Nuernberger @.***> wrote:

100% main:HEAD fixes the test. I ran it many thousands of times just now and saw zero failures.

The test case involves a Clojure datatframe library which means you would be calling Clojure in your unit tests. Are you sure you want the test case :-)? I basically to a N-core reduction across Y datasets followed by thread0_digest.add(rest-of-digests).

This library is fantastic so I am down to help you any way you like.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdunning/t-digest/issues/107#issuecomment-806092790, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5E6WIXOOG24OQNGHM5ILTFI4D3ANCNFSM4E6QSWRA .

tdunning commented 3 years ago

@cnuernber I would like to take you up on your offer.

Can you look at the current main branch plus live issues and identify what you would say are blockers for a point release?

If none and if this sounds like consensus, I will start the process with what we have.

cnuernber commented 3 years ago

Yes, will do.

cnuernber commented 3 years ago

I looked through the issues and have them categorized and such. Here are my opinions- In short I think a point release is worth it now to get the fixes for mergetree.

I messaged you on linked in and we can continue offline from there or we can continue the discussion here; whichever way works for you.

tdunning commented 3 years ago

I will close this after the upcoming release.

tdunning commented 3 years ago

(Slight) progress note.

I have decoded the new behavior of gpg and interactions with the maven code signing (short answer, much better than before). Should be able to cut a release shortly.

rnataf commented 1 year ago

Hi @cnuernber, could you explain how you solved the bug ? I'm having the same one, really rarely also.

cnuernber commented 1 year ago

Well, two ways. Honestly I think this library has been superceded by apache datasketches. What we did before I started work with datasketches was I patched the library and released it on Clojars under a different name.

If you are doing work in this area of course I have to recommend our data processing system :-).

wxbty commented 1 year ago

Well, two ways. Honestly I think this library has been superceded by apache datasketches. What we did before I started work with datasketches was I patched the library and released it on Clojars under a different name.

If you are doing work in this area of course I have to recommend our data processing system :-).

hi,@cnuernber, I saw that your fork just submitted two configurations, how did you solve the current problem?(https://github.com/tdunning/t-digest/compare/main...techascent:t-digest:main)

tdunning commented 1 year ago

@cnuernber Interesting to see the work on tech.ml.dataset. It frankly looks a bit like a reinvention of Arrow, but just in Clojure, at least for the columnar in-memory format work.