Closed tengattack closed 6 years ago
It's not safe to run workers without mutex like that.
It seems some input/output need to do some modified Input file module may crash when multi concurrent thread access
In my opinion, using multiple workers in modules is better than main event bus. ex: it doesn't make sense for statistical module like output/report
If we using multiple workers in modules, it will only one thread for every single event processing instead of multi-threads processing. In another words, I think it may can not fully use the CPU.
I think we can try to disable multi workers on some module, how do you like?
I think the main event bus should not be the bottleneck of the whole processing. I'm trying to do some stress testing.
I write a simple input/lorem module for stress testing main event bus throughput.
Result: ~360k events/s Environment:
So, increase the main event bus worker only if the events throughput > 300 k/s in my MBA Try to test your system to get the real events throughput.
One more thing, there is a method to work around: run multiple gogstash process...XD
[20/Sep/2018:18:24:08 +0800] Process 1838515 events
[20/Sep/2018:18:24:13 +0800] Process 1732499 events
[20/Sep/2018:18:24:18 +0800] Process 1770475 events
[20/Sep/2018:18:24:23 +0800] Process 1870155 events
input:
- type: lorem
worker: 4
duration: "20s"
empty: true
output:
- type: report
Yes, run multiple instances is a workaround way.
I have tried pprof in my system: redis -> filters(grok, date) -> elastic
It will eat full CPUs only if I using multi workers on the event bus.
And got these results:
$ go tool pprof gogstash profile.output
File: gogstash
Type: cpu
Time: Sep 20, 2018 at 6:09pm (CST)
Duration: 30s, Total samples = 58.24s (194.13%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 20
Showing nodes accounting for 37.48s, 64.35% of 58.24s total
Dropped 425 nodes (cum <= 0.29s)
Showing top 20 nodes out of 184
flat flat% sum% cum cum%
11.83s 20.31% 20.31% 20.19s 34.67% runtime.scanobject
6.37s 10.94% 31.25% 6.37s 10.94% runtime.heapBitsForObject
3.19s 5.48% 36.73% 3.19s 5.48% runtime.greyobject
2.53s 4.34% 41.07% 8.04s 13.80% runtime.mallocgc
1.40s 2.40% 43.48% 1.66s 2.85% encoding/json.stateInString
1.38s 2.37% 45.84% 1.38s 2.37% runtime.heapBitsSetType
1.34s 2.30% 48.15% 2.90s 4.98% runtime.wbBufFlush1
1.06s 1.82% 49.97% 1.36s 2.34% regexp.(*machine).tryBacktrack
1.04s 1.79% 51.75% 3.12s 5.36% runtime.gcWriteBarrier
0.94s 1.61% 53.37% 0.94s 1.61% runtime.memclrNoHeapPointers
0.79s 1.36% 54.72% 0.79s 1.36% runtime.memmove
0.78s 1.34% 56.06% 1.89s 3.25% encoding/json.(*decodeState).scanWhile
0.76s 1.30% 57.37% 0.83s 1.43% sync.(*Mutex).Lock
0.68s 1.17% 58.53% 1.92s 3.30% encoding/json.compact
0.62s 1.06% 59.60% 1.52s 2.61% runtime.sweepone
0.58s 1% 60.59% 10.24s 17.58% encoding/json.(*decodeState).object
0.56s 0.96% 61.56% 1.58s 2.71% runtime.bulkBarrierPreWrite
0.55s 0.94% 62.50% 2.58s 4.43% runtime.mapassign_faststr
0.54s 0.93% 63.43% 1.38s 2.37% encoding/json.stateEndValue
0.54s 0.93% 64.35% 0.85s 1.46% runtime.gcmarknewobject
$ go tool pprof -alloc_objects gogstash heap.output
File: gogstash
Type: alloc_objects
Time: Sep 20, 2018 at 6:11pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 20
Showing nodes accounting for 561906141, 88.50% of 634914791 total
Dropped 132 nodes (cum <= 3174573)
Showing top 20 nodes out of 100
flat flat% sum% cum cum%
173901238 27.39% 27.39% 173901238 27.39% reflect.unsafe_New
116497231 18.35% 45.74% 116513647 18.35% regexp.(*Regexp).FindAllStringSubmatch
30769621 4.85% 50.58% 30769621 4.85% strings.genSplit
24585539 3.87% 54.46% 187974517 29.61% github.com/tsaikd/KDGoLib/jsonex.(*mapEncoder).encode
22921915 3.61% 58.07% 188136736 29.63% encoding/json.(*decodeState).object
21903401 3.45% 61.52% 53705325 8.46% encoding/json.(*decodeState).objectInterface
20319111 3.20% 64.72% 24997048 3.94% encoding/json.(*decodeState).literalStore
18383128 2.90% 67.61% 147819983 23.28% github.com/tsaikd/gogstash/filter/grok.(*FilterConfig).Event
18153748 2.86% 70.47% 30327341 4.78% encoding/json.(*decodeState).literalInterface
16138581 2.54% 73.01% 16138581 2.54% encoding/json.unquote
12397860 1.95% 74.97% 233909438 36.84% gopkg.in/olivere/elastic%2ev5.(*BulkIndexRequest).Source
12316433 1.94% 76.91% 12316433 1.94% reflect.mapiterinit
12212942 1.92% 78.83% 84888951 13.37% reflect.Value.MapKeys
10886423 1.71% 80.54% 13016408 2.05% github.com/tsaikd/gogstash/config/logevent.LogEvent.getJSONMap
10223772 1.61% 82.15% 33980934 5.35% reflect.cvtBytesString
10069169 1.59% 83.74% 10069169 1.59% bytes.makeSlice
9568401 1.51% 85.25% 9568401 1.51% encoding/json.stateBeginValue
7668004 1.21% 86.46% 12124520 1.91% encoding/json.(*scanner).error
6553395 1.03% 87.49% 171384869 26.99% github.com/tsaikd/gogstash/input/redis.queueMessage
6436229 1.01% 88.50% 10625838 1.67% github.com/vjeantet/grok.(*Grok).compiledParse
I think maybe the json decode/encode is the key issue, since lorem -> report
has no heavy json works
I have do a simple test by add filters base on the lastest code,
CPU: Ryzen 7 1700X 3.4GHz
Using input lorem worker:
chsize: 50000
workers: 1
debugprof: localhost:6060
input:
- type: lorem
worker: 8
duration: "20s"
format: '{{.TimeFormat "20060102-150405.000"}}|ERROR|140375654323968||28027|10.104.189.207|filebeat_config|test.py|cloud_alarm_callback|5|{{.Sentence 1 5}}'
filter:
- type: grok
match: '%{DATA:date}\|%{DATA:level}\|%{DATA:thread_id}\|%{DATA:context_id}\|%{DATA:process_id}\|%{DATA:ip}\|%{DATA:module_name}\|%{DATA:file}\|%{DATA:function_name}\|%{DATA:line_no}\|%{DATA:log_body}'
source: "message"
- type: remove_field
remove_message: true
fields: ["message", "@timestamp"]
- type: date
format: "YYYYMMdd-HHmmss.SSS"
source: date
joda: true
output:
- type: report
Result:
[26/Sep/2018:18:49:36 +0800] Process 298590 events
[26/Sep/2018:18:49:41 +0800] Process 298611 events
[26/Sep/2018:18:49:46 +0800] Process 298156 events
[26/Sep/2018:18:49:51 +0800] Process 301583 events
~300000/s
And if I trying to increase the main event bus workers:
Result:
[26/Sep/2018:18:53:34 +0800] Process 60373 events
[26/Sep/2018:18:53:34 +0800] Process 60346 events
[26/Sep/2018:18:53:34 +0800] Process 60438 events
[26/Sep/2018:18:53:34 +0800] Process 60386 events
[26/Sep/2018:18:53:34 +0800] Process 60353 events
[26/Sep/2018:18:53:34 +0800] Process 60421 events
[26/Sep/2018:18:53:34 +0800] Process 60460 events
[26/Sep/2018:18:53:34 +0800] Process 60452 events
[26/Sep/2018:18:53:39 +0800] Process 60737 events
[26/Sep/2018:18:53:39 +0800] Process 60743 events
[26/Sep/2018:18:53:39 +0800] Process 60756 events
[26/Sep/2018:18:53:39 +0800] Process 60755 events
[26/Sep/2018:18:53:39 +0800] Process 60749 events
[26/Sep/2018:18:53:39 +0800] Process 60788 events
[26/Sep/2018:18:53:39 +0800] Process 60734 events
[26/Sep/2018:18:53:39 +0800] Process 60775 events
[26/Sep/2018:18:53:44 +0800] Process 60746 events
[26/Sep/2018:18:53:44 +0800] Process 60694 events
[26/Sep/2018:18:53:44 +0800] Process 60705 events
[26/Sep/2018:18:53:44 +0800] Process 60693 events
[26/Sep/2018:18:53:44 +0800] Process 60786 events
[26/Sep/2018:18:53:44 +0800] Process 60748 events
[26/Sep/2018:18:53:44 +0800] Process 60731 events
[26/Sep/2018:18:53:44 +0800] Process 60770 events
final ~480000/s
And, I tried 8 processes of gogstash with single worker:
every process with ~200000/s
[26/Sep/2018:19:10:03 +0800] Process 202186 events
[26/Sep/2018:19:10:08 +0800] Process 201989 events
[26/Sep/2018:19:10:13 +0800] Process 204152 events
[26/Sep/2018:19:10:18 +0800] Process 200584 events
final: ~1600000/s
And I have tested on linux (E5-2682 v4 @ 2.50GHz):
Maybe we should implement workers option as fork
:)
I prefer one main bus and multiple goroutines in input/filter/output modules. Fork processes can be done by shell script.
I end up want to make a config option for the fork process numbers, using syscall
can simply share the same input/output fd and signals which is needed for some scenarios.
If we don't need it, we just leave the option empty or set to 1, there is no difference from now.
Add the workers config to increase CPU usage, since there only one concurrent thread for filter & output.
In real, it can boost up 2x ~ 3x speed for index rate in elasticsearch.