Closed toni-moreno closed 8 years ago
Afaik there is no way to reduce write iops. Aside of sending less metrics or keeping a lower granularity.
Hi @bmhatfield @mleinart any suggestion on this issue?
In your specific configuration, you can reduce write IOPs, but it requires a bit of a trade-off. Each Whisper retention requires a fair bit of IO, which you can cut down by simplifying your policy to 2 or even 1 retention, instead of 3. I forget who said it (perhaps @mleinart), but the advice I have heard is to aim for an upper bound of 2 retention rates.
Note that you will have to resize your existing metrics in addition to changing the config to realize this value.
An example reduction in your configuration might be to change
60s:15d,5m:90d,1h:410d
to
60s:60d,1h:2y
, for example.
Thank you for the fast answer @bmhatfield,
There is not any way to force system doing IO with biggest chunks of data on each io operation ?
Really nothing?
I don't have anything I can offer you off the top of my head. If you decide to experiment and discover something meaningful, We'd LOVE to hear about it and get it merged in.
The main configuration knob you can fiddle with is MAX_UPDATES_PER_SECOND (SAN thrashing is actually what this setting and carbon-cache were originally written for). Reducing this will throttle writes to disk and force more points per metric to be cached. If the Hitachi isnt liking tons of small random writes, it may behave better when writes are allowed to batch up more. Keep an eye on pointsPerUpdate as well as cache.size as you reduce it. The big downside with doing this is that more data is cached in carbon and subject to be lost if you lose the VM or something
You might also look into the vm.dirty_ratio and vm.dirty_background_ratio kernel settings to tune the page cache. My instinct would be to reduce the kernel's caching and increase carbon's (as done above), though I don't have any direct experience doing that.
I dont think I'd recommend it in this scenario, but if you wanted to bypass the kernel cache entirely (as in the #535 pull you cite) you could set WHISPER_AUTOFLUSH to true. This causes a flush() to be called after every write. If you do choose to try this, I'd only do it after getting your pointsPerUpdate nice and high as you could thrash the storage array even more without that.
hope this helps!
Hi @toni-moreno, We're still using SAN on one of our clusters, and I can just confirm what @mleinart said - tune your MAX_UPDATES_PER_SECOND and also you can try to set WHISPER_AUTOFLUSH to true.
Hi @deniszh @mleinart , I've been doing a stress test with this tool (https://github.com/feangulo/graphite-stresser) in a testing server with exact infrastructure than production servers.
We have done two test with 86K metrics.
The first with MAX_UPDATE_PER_SECOND=50000 ( no limit), Graphite behavior ok with 3K IOPS to the Hitachi array.
When configured MAX_UPDATE_PER_SECOND=1000 Graphite behavior was erratic.
I've tested data availability with this script (https://gist.github.com/toni-moreno/fa174cc1bb38f7178afa09305f3c5397), that does a lot of http request to metrics loaded from a metric list file, and finally logs response time and how many nulls ( not available data) at the end of the time series.( 1 data-point / minute)
As you can see below there is a lot of metrics without data from 20/60 minutes ago. It seems like the merge done with stored data in disk and queue data in memory was not working fine.
Could be a graphite bug perhaps?
[ Thu Jun 2 19:01:55 2016 ] OK METRIC: STRESS.host.ip-57.com.graphite.stresser.d.m15_rate 720 nulls: 0 elapsed time: 0.0688331127167
[ Thu Jun 2 19:01:55 2016 ] OK METRIC: STRESS.host.ip-88.com.graphite.stresser.cadb.m1_rate 720 nulls: 6 elapsed time: 0.0508050918579
[ Thu Jun 2 19:01:55 2016 ] OK METRIC: STRESS.host.ip-82.com.graphite.stresser.bcda.stddev 720 nulls: 0 elapsed time: 0.0655510425568
[ Thu Jun 2 19:01:56 2016 ] OK METRIC: STRESS.host.ip-62.com.graphite.stresser.dab.p95 720 nulls: 0 elapsed time: 0.0328180789948
[ Thu Jun 2 19:01:56 2016 ] OK METRIC: STRESS.host.ip-60.com.graphite.stresser.abcd.min 720 nulls: 49 elapsed time: 0.199838161469
[ Thu Jun 2 19:01:57 2016 ] OK METRIC: STRESS.host.ip-16.com.graphite.stresser.bc.p99 720 nulls: 0 elapsed time: 0.030189037323
[ Thu Jun 2 19:01:57 2016 ] OK METRIC: STRESS.host.ip-74.com.graphite.stresser.a.min 720 nulls: 0 elapsed time: 0.0377690792084
[ Thu Jun 2 19:01:57 2016 ] OK METRIC: STRESS.host.ip-78.com.graphite.stresser.cda.m5_rate 720 nulls: 0 elapsed time: 0.0334861278534
[ Thu Jun 2 19:01:58 2016 ] OK METRIC: STRESS.host.ip-1.com.graphite.stresser.cad.m1_rate 720 nulls: 32 elapsed time: 0.0600860118866
[ Thu Jun 2 19:01:58 2016 ] OK METRIC: STRESS.host.ip-53.com.graphite.stresser.ca.min 720 nulls: 0 elapsed time: 0.0442810058594
[ Thu Jun 2 19:01:59 2016 ] OK METRIC: STRESS.host.ip-59.com.graphite.stresser.dba.max 720 nulls: 0 elapsed time: 0.0432779788971
[ Thu Jun 2 19:01:59 2016 ] OK METRIC: STRESS.host.ip-17.com.graphite.stresser.cdab.p98 720 nulls: 14 elapsed time: 0.110908031464
[ Thu Jun 2 19:01:59 2016 ] OK METRIC: STRESS.host.ip-28.com.graphite.stresser.dcb.p999 720 nulls: 31 elapsed time: 0.0806908607483
[ Thu Jun 2 19:02:00 2016 ] OK METRIC: STRESS.host.ip-48.com.graphite.stresser.dacb.m1_rate 720 nulls: 6 elapsed time: 0.217912197113
[ Thu Jun 2 19:02:00 2016 ] OK METRIC: STRESS.host.ip-49.com.graphite.stresser.abdc.stddev 720 nulls: 1 elapsed time: 0.0250878334045
[ Thu Jun 2 19:02:01 2016 ] OK METRIC: STRESS.host.ip-87.com.graphite.stresser.cdba.mean_rate 720 nulls: 19 elapsed time: 0.0460441112518
[ Thu Jun 2 19:02:01 2016 ] OK METRIC: STRESS.host.ip-56.com.graphite.stresser.bacd.min 720 nulls: 15 elapsed time: 0.0483469963074
[ Thu Jun 2 19:02:01 2016 ] OK METRIC: STRESS.host.ip-28.com.graphite.stresser.dab.stddev 720 nulls: 32 elapsed time: 0.0925581455231
[ Thu Jun 2 19:02:02 2016 ] OK METRIC: STRESS.host.ip-49.com.graphite.stresser.abc.mean 720 nulls: 1 elapsed time: 0.0342679023743
[ Thu Jun 2 19:02:02 2016 ] OK METRIC: STRESS.host.ip-50.com.graphite.stresser.bad.min 720 nulls: 19 elapsed time: 0.0319290161133
[ Thu Jun 2 19:02:03 2016 ] OK METRIC: STRESS.host.ip-4.com.graphite.stresser.ac.p98 720 nulls: 1 elapsed time: 0.0867421627045
[ Thu Jun 2 19:02:03 2016 ] OK METRIC: STRESS.host.ip-15.com.graphite.stresser.adbc.mean_rate 720 nulls: 47 elapsed time: 0.066370010376
[ Thu Jun 2 19:02:03 2016 ] OK METRIC: STRESS.host.ip-42.com.graphite.stresser.cb.p50 720 nulls: 1 elapsed time: 0.0299911499023
[ Thu Jun 2 19:02:04 2016 ] OK METRIC: STRESS.host.ip-79.com.graphite.stresser.bdc.mean 720 nulls: 1 elapsed time: 0.127055883408
[ Thu Jun 2 19:02:04 2016 ] OK METRIC: STRESS.host.ip-27.com.graphite.stresser.dacb.max 720 nulls: 1 elapsed time: 0.0695948600769
[ Thu Jun 2 19:02:05 2016 ] OK METRIC: STRESS.host.ip-82.com.graphite.stresser.d.p75 720 nulls: 1 elapsed time: 0.0412969589233
[ Thu Jun 2 19:02:05 2016 ] OK METRIC: STRESS.host.ip-48.com.graphite.stresser.abdc.p98 720 nulls: 14 elapsed time: 0.0475289821625
[ Thu Jun 2 19:02:05 2016 ] OK METRIC: STRESS.host.ip-71.com.graphite.stresser.dca.m15_rate 720 nulls: 1 elapsed time: 0.0548729896545
[ Thu Jun 2 19:02:06 2016 ] OK METRIC: STRESS.host.ip-48.com.graphite.stresser.dbac.mean 720 nulls: 1 elapsed time: 0.0398018360138
[ Thu Jun 2 19:02:06 2016 ] OK METRIC: STRESS.host.ip-89.com.graphite.stresser.cabd.m15_rate 720 nulls: 1 elapsed time: 0.11070394516
[ Thu Jun 2 19:02:07 2016 ] OK METRIC: STRESS.host.ip-70.com.graphite.stresser.dabc.count 720 nulls: 1 elapsed time: 0.0314590930939
[ Thu Jun 2 19:02:07 2016 ] OK METRIC: STRESS.host.ip-56.com.graphite.stresser.adc.stddev 720 nulls: 1 elapsed time: 0.039439201355
[ Thu Jun 2 19:02:07 2016 ] OK METRIC: STRESS.host.ip-38.com.graphite.stresser.dcab.m5_rate 720 nulls: 23 elapsed time: 0.0327999591827
[ Thu Jun 2 19:02:08 2016 ] OK METRIC: STRESS.host.ip-77.com.graphite.stresser.abc.count 720 nulls: 29 elapsed time: 0.136918783188
[ Thu Jun 2 19:02:08 2016 ] OK METRIC: STRESS.host.ip-82.com.graphite.stresser.bacd.stddev 720 nulls: 1 elapsed time: 0.0634460449219
[ Thu Jun 2 19:02:09 2016 ] OK METRIC: STRESS.host.ip-48.com.graphite.stresser.cbda.min 720 nulls: 1 elapsed time: 0.0412521362305
[ Thu Jun 2 19:02:09 2016 ] OK METRIC: STRESS.host.ip-16.com.graphite.stresser.c.count 720 nulls: 44 elapsed time: 0.0268151760101
[ Thu Jun 2 19:02:09 2016 ] OK METRIC: STRESS.host.ip-57.com.graphite.stresser.dbca.p999 720 nulls: 1 elapsed time: 0.0261521339417
[ Thu Jun 2 19:02:10 2016 ] OK METRIC: STRESS.host.ip-81.com.graphite.stresser.bdca.mean 720 nulls: 1 elapsed time: 0.0479469299316
[ Thu Jun 2 19:02:10 2016 ] OK METRIC: STRESS.host.ip-52.com.graphite.stresser.bd.max 720 nulls: 35 elapsed time: 0.0628139972687
[ Thu Jun 2 19:02:10 2016 ] OK METRIC: STRESS.host.ip-84.com.graphite.stresser.dacb.p75 720 nulls: 1 elapsed time: 0.0726640224457
[ Thu Jun 2 19:02:11 2016 ] OK METRIC: STRESS.host.ip-52.com.graphite.stresser.da.stddev 720 nulls: 16 elapsed time: 0.0722517967224
[ Thu Jun 2 19:02:11 2016 ] OK METRIC: STRESS.host.ip-71.com.graphite.stresser.ad.m15_rate 720 nulls: 53 elapsed time: 0.0792770385742
[ Thu Jun 2 19:02:12 2016 ] OK METRIC: STRESS.host.ip-33.com.graphite.stresser.adcb.min 720 nulls: 11 elapsed time: 0.0649588108063
[ Thu Jun 2 19:02:12 2016 ] OK METRIC: STRESS.host.ip-47.com.graphite.stresser.dac.max 720 nulls: 22 elapsed time: 0.042662858963
[ Thu Jun 2 19:02:12 2016 ] OK METRIC: STRESS.host.ip-11.com.graphite.stresser.bc.p75 720 nulls: 57 elapsed time: 0.066419839859
[ Thu Jun 2 19:02:13 2016 ] OK METRIC: STRESS.host.ip-39.com.graphite.stresser.dc.p99 720 nulls: 0 elapsed time: 0.0433480739594
@mleinart , @deniszh , I've been having some issues with this exact version (from git master branch last June)
GRAPHITE-WEB
commit 67e463e1efa85b8c5cf022f9abffa3d739175d1e
Merge: ec22fe2 34b223a
Author: Jeff Schroeder <jeffschroeder@computer.org>
Date: Mon Jun 15 17:40:11 2015 -0500
Merge pull request #1250 from SEJeff/fix-django18
Make 'pip install -r requirements.txt' work again
CARBON
commit b80ce915a5e420b46e6972512801491e536db1b6
Merge: 94d9f18 1003df1
Author: Jeff Schroeder <jeffschroeder@computer.org>
Date: Fri Apr 24 15:04:12 2015 -0500
Merge pull request #409 from mleinart/aggregator_buffer_tests
New tests for carbon aggregator buffers
WHISPER
commit 1e96c0cd1dc0b361177c585033cfbbb5711a191f
Merge: 75e35fd bbd37c5
Author: Jeff Schroeder <jeffschroeder@computer.org>
Date: Wed Jun 24 01:36:09 2015 -0400
Merge pull request #66 from acdha/patch-1
Simple script to find corrupt Whisper files
I'm thinking to repeat this test after update to some newer /stable version.
Which version is the most suitable for storage arrays and working with python 2.6.6 (RHEL6.7) ??
Here some graphs with performance in the stress test done yesterday ( from 17:00 until 20:00)
while the stresser was writing data we have launched the http tester script ( from 18:00 to 19:00) , in this time we have done 8763 request, and 3478 has more than 3 nulls ( more than 3 minutes delay).
The delay average on these "bad" request is 25 minutes.
@toni-moreno, Sorry, but I'm completely lost your point - what are you trying to do or proof here? Graphite is a complex system, and like any complex system, it could degrade in many strange or bizarre ways. My cluster is working fine but if I put 10x more load there - It will die horribly.
Hi @denish , And sorry for my poor English.
As I said (https://github.com/graphite-project/carbon/issues/553#issue-152966004) , we need limit write IOPS on the underlying storage.
I will do stress test with different configurations and graphite versions to evaluate the best way to decrease IOPS. But also I need all available data online
We take into account that a request is ok for me if it contains all data from any time in the past to at least 3 minutes ago.
With the script (https://gist.github.com/toni-moreno/fa174cc1bb38f7178afa09305f3c5397) we can measure how many minutes on data leaks each graphite response and how many request are not OK.
With MAX_UPDATE_PER_SECOND=inf ( and always the same load) all is working fine
In the stress test done yesterday I changed config to MAX_UPDATE_PER_SECOND=1000, with this configuration carbon seems like all carbon-cache , were storing in a queue data-points in memory.
But It seems like carbon only are serving data stored in disk to graphite-web frontend .If I'm not wrong , carbon should merge both ( disk stored data, and memory queued data) isn't it?
When looking more in detail the output log we can see (https://github.com/graphite-project/carbon/issues/553#issuecomment-223360560 ) a lot of request with more than 15 minutes delay (nulls: 32, nulls: 44,nulls: 35, etc).
Is this behavior usual? or perhaps a but?
Anyway these behavior is completely undesirable for us.
We are planning (if needed ) a graphite/carbon/whisper update and repeat these tests again.
Which version is the most suitable for storage arrays and working with python 2.6.6 (RHEL6.7) ??
Thanks a lot for your help
Hi @toni-moreno, Nah, I didn't mean your English, mine is also terrible, even with Grammarly. :) I say that target of your test was not clear, at least for me - and now it's clear, thanks for an explanation.
But It seems like carbon only are serving data stored in disk to graphite-web frontend .If I'm not wrong , carbon should merge both ( disk stored data, and memory queued data) isn't it?
Exactly, it should. That's why carbonlink protocol exists. It's possible to serve metrics from disk, but only to some extent and on SSDs, of course. So, it looks like there's some problem with your setup - because the cache is not working, and you're able to serve metrics from disk only.
Which version is the most suitable for storage arrays and working with python 2.6.6 (RHEL6.7) ??
Graphite has no special versions for SAN disks. Master branch switched to Python 2.7, but 0.9.x should work on 2.6. The latest release is 0.9.15 now. You can also use 0.9.x branch from Github.
@genisd
I believe that they're two different processes, which have nothing in common. No shared memory buffer or awareness. Some processes write and some read. As far as reading is concerned I think the underlying filesystem cache is the only real caching for reading data. I don't think this has been changed in recent versions, but I could be wrong.
Sorry, but it's not how Graphite works. Of course, graphite-web and carbon are different processes, and not using shared memory. That's why carbon not only writing metrics on disk but also storing them in cache, and return seamlessly merged result to graphite-web
@deniszh : Hi , I will try to "downgrade" to the last 0.9.X git commit , and I will repeat the test.
About downgrade, When I did installation ( one year ago) I did with. pip/install.
pip install -r requirements.txt
python ./setup.py install
Should "python ./setup.py uninstall" be enough to clean current version for graphite-web/carbon/whisper? . Any suggestion about the best way to clean old version/downgrade to 0.9.15 ?
We've got VM's which process over 1.500.000 metrics (one minute interval) on HDS SAN doing about 3k write IOPS . These systems have MAX_UPDATE_PER_SECOND=100 and are running with 6 caches. Downside is that cache size get huge. (about 30 minutes, see pointsperupdate metric)
Data cached should be visible, I expect some issue with your carbonlink hosts in local_settings.py, cache_query port or with your relay config (if you are using one).
After you tuned your MAX_UPDATE_PER_SECOND I expect you will see a lot of read IOPS and they are slow on the SAN (this might depend on the used disks/size of array)... The reads are needed for graphite to the aggregations to lower precision. You can add more RAM to the VM to overcome the reads. With plenty of RAM the reads will come from linux fs cache, instead of being routed on the SAN. The VM's mentioned above have 128 GB.
I've elaborated on batch writes here. It should address any questions you may have around increasing batch writes to decrease write operations.
try another project douban/kenshin,which solved iops problem,thanks!
@luckywarrior - oh, thanks for an info! Does it works good? Could you please share some numbers (read load, write load, number of metrics, size of files on disk)?
@deniszh I test with 6 instances per server with 4 core cpu and 4 G memory,when the load came up to almost 100%, i got following assessment result:
150k metrics received / 10 secs / carbon-c-relay 0 relay dropped 250 iops
Hi , we have a Graphite/Carbon/Whisper box ( a VM made on top of VMware ESX, 8 cores , 16Gb RAM and the ESX attached to a Hitachi disk array) . We have currently receiving e 250K metrics / minute.
We have no option to get separate physical machines or SSD disk , and Storage Administrators have noticed issues on other servers placed in the same Hitachi disk array, because of the great number of write IOPS that we are sending to the disk array.
As you can see in the next picture , we have sending 9K IOPS
Whe have also aggregation on whisper files with this default resolution/retention.
In the past we have reduced Read IOP's by caching data in memory as described on the following issue (https://github.com/graphite-project/carbon/issues/497) and we have 70% of memory with cached data as you can see in the next picture.
Now we need enable any Carbon/Whisper/System configuration to also cache Write data and reduce the Writee # IOPS on the disk array.
Can anybody help us please? Any idea?