Closed DanCech closed 4 years ago
Merging #817 into master will increase coverage by
0.02%
. The diff coverage is0.00%
.
@@ Coverage Diff @@
## master #817 +/- ##
==========================================
+ Coverage 49.62% 49.65% +0.02%
==========================================
Files 37 37
Lines 3434 3432 -2
Branches 494 492 -2
==========================================
Hits 1704 1704
+ Misses 1621 1619 -2
Partials 109 109
Impacted Files | Coverage Δ | |
---|---|---|
lib/carbon/writer.py | 0.00% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update dfda3d4...a0e55db. Read the comment docs.
Well, it should work. Not sure how we can test that properly, though. I'm working on cluster implementation on docker compose, maybe I can add it there, and test it using haggar...
Seems like a good thing to do. I was already wondering about the 1 second sleep before, seems like a bit long.
@deniszh we should figure out how to test this for 1.1.5
Yes, still working on that. Docker image is configurable now, so, I'm gonna create docker compose setup with haggar and test how it works.
Ok, benchmarking is hard. :) I did a small test of this patch vs 1.1.4 using this setup - https://github.com/graphite-project/graphite-test-docker (MBP 2017 / 4CPU / 4G in Docker, 10 clients x 1000 metrics every 10 seconds, run for 1 hour). I did second test for a couple of hours of so, so it has some graphs higher, so, probably 1 hour is too small for stabilize parameters. 1.1.4:
This patch:
I.e. on the first sight it works fine. You can use the same repo to run own tests. /cc @DanCech
OK, what we're gonna do with that? go / no-go? @DanCech @piotr1212 @iksaif ?
Your graphs look different between runs but your conclusion is that it looks good? cpu, load, number of metrics in cache, memory used are all higher.
I don't expect this to have any influence on performance but don't have time to test this right now.
The second graph is after 3-4 hours and first - after the first hour. As I said benchmarking is hard and I don't even try to compare old and new code performance-wise, only provide (quite weak) proof that this PR is not horribly blown in your face.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Merging #817 into master will increase coverage by
0.02%
. The diff coverage is0.00%
.
@@ Coverage Diff @@
## master #817 +/- ##
==========================================
+ Coverage 49.62% 49.65% +0.02%
==========================================
Files 37 37
Lines 3434 3432 -2
Branches 494 492 -2
==========================================
Hits 1704 1704
+ Misses 1621 1619 -2
Partials 109 109
Impacted Files | Coverage Δ | |
---|---|---|
lib/carbon/writer.py | 0.00% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update dfda3d4...a0e55db. Read the comment docs.
Heh, I'd forgotten about this one. Seems like it would still be a good idea
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I have received a report of the writer thread stopping, which results in the metric cache growing and nothing being written to disk.
I'm not sure what the underlying cause is, though it's possible that accessing
reactor.running
from the background thread is part of the issue. I also noted that when the thread callstime.sleep()
between runs it is blocking for no good reason.This PR modified the
writeForever
andwriteTagsForever
functions to run as regular twisted async functions, usingreactor.callLater()
to schedule their next run, andthreads.deferToThread
to callwriteCachedDataPoints
andwriteTags
in threads from the twisted threadpool. This avoids the need to accessreactor
from those threads, and leaves scheduling in the main thread.The only behavior change this should introduce is that I set the wait time after draining the queue to 1s for
writeTagsForever
to matchwriteForever
, though I do wonder whether it would make more sense to shorten that.I have not yet done any load testing on this, and would appreciate any feedback.