Add config workers - Githubissues

tengattack commented 6 years ago

Add the workers config to increase CPU usage, since there only one concurrent thread for filter & output.

In real, it can boost up 2x ~ 3x speed for index rate in elasticsearch.

tsaikd commented 6 years ago

It's not safe to run workers without mutex like that.

https://travis-ci.org/tsaikd/gogstash/jobs/430913053

tengattack commented 6 years ago

It seems some input/output need to do some modified Input file module may crash when multi concurrent thread access

tsaikd commented 6 years ago

In my opinion, using multiple workers in modules is better than main event bus. ex: it doesn't make sense for statistical module like output/report

tengattack commented 6 years ago

If we using multiple workers in modules, it will only one thread for every single event processing instead of multi-threads processing. In another words, I think it may can not fully use the CPU.

tengattack commented 6 years ago

I think we can try to disable multi workers on some module, how do you like?

tsaikd commented 6 years ago

I think the main event bus should not be the bottleneck of the whole processing. I'm trying to do some stress testing.

tsaikd commented 6 years ago

I write a simple input/lorem module for stress testing main event bus throughput.

Result: ~360k events/s Environment:

MacBook Air 2014
Intel(R) Core(TM) i5-4260U CPU @ 1.40GHz

So, increase the main event bus worker only if the events throughput > 300 k/s in my MBA Try to test your system to get the real events throughput.

One more thing, there is a method to work around: run multiple gogstash process...XD

[20/Sep/2018:18:24:08 +0800] Process 1838515 events
[20/Sep/2018:18:24:13 +0800] Process 1732499 events
[20/Sep/2018:18:24:18 +0800] Process 1770475 events
[20/Sep/2018:18:24:23 +0800] Process 1870155 events

input:
  - type: lorem
    worker: 4
    duration: "20s"
    empty: true
output:
  - type: report

tengattack commented 6 years ago

Yes, run multiple instances is a workaround way.

I have tried pprof in my system: redis -> filters(grok, date) -> elastic It will eat full CPUs only if I using multi workers on the event bus.

And got these results:

$ go tool pprof gogstash profile.output
File: gogstash
Type: cpu
Time: Sep 20, 2018 at 6:09pm (CST)
Duration: 30s, Total samples = 58.24s (194.13%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 20
Showing nodes accounting for 37.48s, 64.35% of 58.24s total
Dropped 425 nodes (cum <= 0.29s)
Showing top 20 nodes out of 184
      flat  flat%   sum%        cum   cum%
    11.83s 20.31% 20.31%     20.19s 34.67%  runtime.scanobject
     6.37s 10.94% 31.25%      6.37s 10.94%  runtime.heapBitsForObject
     3.19s  5.48% 36.73%      3.19s  5.48%  runtime.greyobject
     2.53s  4.34% 41.07%      8.04s 13.80%  runtime.mallocgc
     1.40s  2.40% 43.48%      1.66s  2.85%  encoding/json.stateInString
     1.38s  2.37% 45.84%      1.38s  2.37%  runtime.heapBitsSetType
     1.34s  2.30% 48.15%      2.90s  4.98%  runtime.wbBufFlush1
     1.06s  1.82% 49.97%      1.36s  2.34%  regexp.(*machine).tryBacktrack
     1.04s  1.79% 51.75%      3.12s  5.36%  runtime.gcWriteBarrier
     0.94s  1.61% 53.37%      0.94s  1.61%  runtime.memclrNoHeapPointers
     0.79s  1.36% 54.72%      0.79s  1.36%  runtime.memmove
     0.78s  1.34% 56.06%      1.89s  3.25%  encoding/json.(*decodeState).scanWhile
     0.76s  1.30% 57.37%      0.83s  1.43%  sync.(*Mutex).Lock
     0.68s  1.17% 58.53%      1.92s  3.30%  encoding/json.compact
     0.62s  1.06% 59.60%      1.52s  2.61%  runtime.sweepone
     0.58s     1% 60.59%     10.24s 17.58%  encoding/json.(*decodeState).object
     0.56s  0.96% 61.56%      1.58s  2.71%  runtime.bulkBarrierPreWrite
     0.55s  0.94% 62.50%      2.58s  4.43%  runtime.mapassign_faststr
     0.54s  0.93% 63.43%      1.38s  2.37%  encoding/json.stateEndValue
     0.54s  0.93% 64.35%      0.85s  1.46%  runtime.gcmarknewobject

$ go tool pprof -alloc_objects gogstash heap.output
File: gogstash
Type: alloc_objects
Time: Sep 20, 2018 at 6:11pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 20
Showing nodes accounting for 561906141, 88.50% of 634914791 total
Dropped 132 nodes (cum <= 3174573)
Showing top 20 nodes out of 100
      flat  flat%   sum%        cum   cum%
 173901238 27.39% 27.39%  173901238 27.39%  reflect.unsafe_New
 116497231 18.35% 45.74%  116513647 18.35%  regexp.(*Regexp).FindAllStringSubmatch
  30769621  4.85% 50.58%   30769621  4.85%  strings.genSplit
  24585539  3.87% 54.46%  187974517 29.61%  github.com/tsaikd/KDGoLib/jsonex.(*mapEncoder).encode
  22921915  3.61% 58.07%  188136736 29.63%  encoding/json.(*decodeState).object
  21903401  3.45% 61.52%   53705325  8.46%  encoding/json.(*decodeState).objectInterface
  20319111  3.20% 64.72%   24997048  3.94%  encoding/json.(*decodeState).literalStore
  18383128  2.90% 67.61%  147819983 23.28%  github.com/tsaikd/gogstash/filter/grok.(*FilterConfig).Event
  18153748  2.86% 70.47%   30327341  4.78%  encoding/json.(*decodeState).literalInterface
  16138581  2.54% 73.01%   16138581  2.54%  encoding/json.unquote
  12397860  1.95% 74.97%  233909438 36.84%  gopkg.in/olivere/elastic%2ev5.(*BulkIndexRequest).Source
  12316433  1.94% 76.91%   12316433  1.94%  reflect.mapiterinit
  12212942  1.92% 78.83%   84888951 13.37%  reflect.Value.MapKeys
  10886423  1.71% 80.54%   13016408  2.05%  github.com/tsaikd/gogstash/config/logevent.LogEvent.getJSONMap
  10223772  1.61% 82.15%   33980934  5.35%  reflect.cvtBytesString
  10069169  1.59% 83.74%   10069169  1.59%  bytes.makeSlice
   9568401  1.51% 85.25%    9568401  1.51%  encoding/json.stateBeginValue
   7668004  1.21% 86.46%   12124520  1.91%  encoding/json.(*scanner).error
   6553395  1.03% 87.49%  171384869 26.99%  github.com/tsaikd/gogstash/input/redis.queueMessage
   6436229  1.01% 88.50%   10625838  1.67%  github.com/vjeantet/grok.(*Grok).compiledParse

tengattack commented 6 years ago

I think maybe the json decode/encode is the key issue, since lorem -> report has no heavy json works

tengattack commented 6 years ago

I have do a simple test by add filters base on the lastest code,

CPU: Ryzen 7 1700X 3.4GHz

Using input lorem worker:

1 process, 1 main bus worker, 8 input workers

chsize: 50000
workers: 1
debugprof: localhost:6060

input:
  - type: lorem
    worker: 8
    duration: "20s"
    format: '{{.TimeFormat "20060102-150405.000"}}|ERROR|140375654323968||28027|10.104.189.207|filebeat_config|test.py|cloud_alarm_callback|5|{{.Sentence 1 5}}'

filter:
  - type: grok
    match: '%{DATA:date}\|%{DATA:level}\|%{DATA:thread_id}\|%{DATA:context_id}\|%{DATA:process_id}\|%{DATA:ip}\|%{DATA:module_name}\|%{DATA:file}\|%{DATA:function_name}\|%{DATA:line_no}\|%{DATA:log_body}'
    source: "message"
  - type: remove_field
    remove_message: true
    fields: ["message", "@timestamp"]
  - type: date
    format: "YYYYMMdd-HHmmss.SSS"
    source: date
    joda: true

output:
  - type: report

Result:

[26/Sep/2018:18:49:36 +0800] Process 298590 events
[26/Sep/2018:18:49:41 +0800] Process 298611 events
[26/Sep/2018:18:49:46 +0800] Process 298156 events
[26/Sep/2018:18:49:51 +0800] Process 301583 events

~300000/s

tengattack commented 6 years ago

And if I trying to increase the main event bus workers:

1 process, 8 main bus workers, 1 input worker

Result:

[26/Sep/2018:18:53:34 +0800] Process 60373 events
[26/Sep/2018:18:53:34 +0800] Process 60346 events
[26/Sep/2018:18:53:34 +0800] Process 60438 events
[26/Sep/2018:18:53:34 +0800] Process 60386 events
[26/Sep/2018:18:53:34 +0800] Process 60353 events
[26/Sep/2018:18:53:34 +0800] Process 60421 events
[26/Sep/2018:18:53:34 +0800] Process 60460 events
[26/Sep/2018:18:53:34 +0800] Process 60452 events
[26/Sep/2018:18:53:39 +0800] Process 60737 events
[26/Sep/2018:18:53:39 +0800] Process 60743 events
[26/Sep/2018:18:53:39 +0800] Process 60756 events
[26/Sep/2018:18:53:39 +0800] Process 60755 events
[26/Sep/2018:18:53:39 +0800] Process 60749 events
[26/Sep/2018:18:53:39 +0800] Process 60788 events
[26/Sep/2018:18:53:39 +0800] Process 60734 events
[26/Sep/2018:18:53:39 +0800] Process 60775 events
[26/Sep/2018:18:53:44 +0800] Process 60746 events
[26/Sep/2018:18:53:44 +0800] Process 60694 events
[26/Sep/2018:18:53:44 +0800] Process 60705 events
[26/Sep/2018:18:53:44 +0800] Process 60693 events
[26/Sep/2018:18:53:44 +0800] Process 60786 events
[26/Sep/2018:18:53:44 +0800] Process 60748 events
[26/Sep/2018:18:53:44 +0800] Process 60731 events
[26/Sep/2018:18:53:44 +0800] Process 60770 events

final ~480000/s

tengattack commented 6 years ago

And, I tried 8 processes of gogstash with single worker:

8 processes, 1 main bus worker, 1 input worker

every process with ~200000/s

[26/Sep/2018:19:10:03 +0800] Process 202186 events
[26/Sep/2018:19:10:08 +0800] Process 201989 events
[26/Sep/2018:19:10:13 +0800] Process 204152 events
[26/Sep/2018:19:10:18 +0800] Process 200584 events

final: ~1600000/s

tengattack commented 6 years ago

And I have tested on linux (E5-2682 v4 @ 2.50GHz):

1 process, 1 main bus worker, 4 input workers: ~138000/s
1 process, 4 main bus workers, 1 input worker: ~240000/s
4 processes, 1 main bus worker, 1 input worker: ~392000/s

tengattack commented 6 years ago

Maybe we should implement workers option as fork :)

tsaikd commented 6 years ago

I prefer one main bus and multiple goroutines in input/filter/output modules. Fork processes can be done by shell script.

tengattack commented 6 years ago

I end up want to make a config option for the fork process numbers, using syscall can simply share the same input/output fd and signals which is needed for some scenarios. If we don't need it, we just leave the option empty or set to 1, there is no difference from now.

tsaikd / gogstash

Add config workers #54