Closed lorenzo closed 7 years ago
I added a size
of TDigest
structure to tdigest-simple
tester output.
With tdigest-simple -m tdigest -d normal -s <input size> +RTS -s
input size runtime & maximum residency
2000000 373
4000000 388
6000000 401
8000000 413
10000000 422
20000000 439
50000000 463
100000000 478
200000000 497 7min, 496,016 max res
400000000 523 15min, 497,696 max res
800000000 536 31min, 498,536 max res (~400k samples a second)
this trend is not good (growing!), the limit is 1000 after which everything still works but is very slow.
IIRC if one changes q * (1 - q)
, to sqrt (q * (1 - q))
,
then we'll trade cpu (and some precision?) for stability. But I remember
seeing that from some presentation, not the paper.
FWIW the paper is changed from https://github.com/tdunning/t-digest/blob/41d3eb3c03947f0d1a04c66ecfc943d04ae239fb/docs/t-digest-paper/histo.pdf to https://github.com/tdunning/t-digest/blob/07b8f2ca2be8d0a9f04df2feadad5ddc1bb73c88/docs/t-digest-paper/histo.pdf and the algorithm has been tweaked considerably. Have to try new variants.
In meanwhile: If your app will ever estimate at max order of 10^8 samples during its lifetime, you should be ok. If more, let's talk more.
In meanwhile: If your app will ever estimate at max order of 10^8 samples
Not at all, usually around the 1 - 10 million entries, yet I can see the memory growing significantly. It hasn't been a problem at all so far, I was more wondering why I could see it linearly increase.
I'm using it for this project, BTW: https://github.com/seatgeek/wrecker
If you want to play with it do something like
wreck https://localhost:8080 --concurrency 100
and keep an eye on the memory consumption. I noticed t-digest was the main driver of the memory going up after I replaced the statistics recorder with a no-op.
In any case it is a non-issue for me, was actually just curious about the reason.
That link is broken, is it private repository?
Anyhow, nice to see tdigest
been used in the wild.
Ah crap, the public one is https://github.com/lorenzo/wrecker
@lorenzo thanks, I'll try it out! (seems like a nice tool too)
Great, thanks! I'll close this ticket as it was more a question than an actual problem
I have an application where I use this, and have observed that the memory usage increases linearly (slowly) as I insert more data in the t-digest.
Is this expected? I thought that by being an online stats calculator it would keep the memory constant