logstash-plugins / logstash-input-http

Apache License 2.0
51 stars 66 forks source link

I have a problem with the logstash-input-http #76

Closed Ccxlp closed 6 years ago

Ccxlp commented 6 years ago

I used the latest version of 3.0.8 about logstash ,The post request sent from the Java client httpclient is filtered through the logstash and placed in es,But the problem is that the request sent by httpclient enters the logstash at a very slow speed: 5/ s. Through fidder grab bag view, the information is as follows: GotRequestHeaders: 20:44:57.472 ClientDoneRequest: 20:44:57.472 Determine Gateway: 0ms DNS Lookup: 0ms TCP/IP Connect: 0ms HTTPS Handshake: 0ms ServerConnected: 20:44:56.272 FiddlerBeginRequest: 20:44:57.472 ServerGotRequest: 20:44:57.472 ServerBeginResponse: 20:44:57.475 GotResponseHeaders: 20:44:57.475 ServerDoneResponse: 20:44:57.675 ClientBeginResponse: 20:44:57.675 ClientDoneResponse: 20:44:57.675 Overall Elapsed: 00:00:00.2030116 Response assembly header cost 200ms? At first I thought it was an httpclient problem, but when I sent messages to the tomcat server that was inside the Intranet, the speed was 500/s (single thread).Next, I use the curl script to send messages to HTTP input on the server where logstash is located, and it can also have 300/s speed.It is important to note that httpclient, logstash, and tomcat services are all in the same Intranet. logstash.yml: pipeline.output.workers: 8 pipeline.workers: 10 pipeline.batch.size: 500 Who can help me explain why this happens?

jsvd commented 6 years ago

Can you provide your configuration?

I've done a basic test to see the speed of 3.0.8 in logstash 6.1:

/tmp/logstash-6.1.0 % bin/logstash-plugin list --verbose http
logstash-input-http (3.0.8)

On the producer side:

/tmp/logstash-6.1.0 % bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://localhost:8080' http_method => post } }" --path.data data2

On the consumer side running a couple of minutes:

/tmp/logstash-6.1.0 % bin/logstash -e "input { http {port => 8080} }" | pv -War > /dev/null
[326KiB/s] [ 204KiB/s]

204k events per second on a local machine.

Ccxlp commented 6 years ago

@jsvd I'm sorry , I might not have described it clearly. I set up an elk cluster,X-pack is installed to monitor some key metrics,When I found that the data entry was very slow, I tried a separate test for login-http-input, configuration file: input{ http{ host => "0.0.0.0" port => 10001 additional_codecs => {"application/json"=>"json"} threads => 10 } } output{ stdout{ codec => dots } }. In addition, the Java httpclient that previously sent HTTP is running on Windows.When the speed exception is found, instead of sending HTTP to the local tomcat server, which is a satisfactory 500 events/s.To test the throughput of loginch-http-input, I also tried to send curl HTTP at the same time: function fun(){ for (( i = 0; i < 300000; i++ ));do { line="http://172.16.0.50:10001/ear/erererer" isok=curl -s -X POST --data '{"clientHost":"172.16.0.50","requestTime":"11/Aug/2017:17:21:05 +0800","method":"post","page":"/openlayers3/index3","protocol":"HTTP/1.1","stateCode":"200","bytes":"3245"}' -w %{http_code} $line -o /dev/null } done } for((j = 0;j<5;j++));do { fun > /dev/null & } done Throughput is 1000 events/s. This speed is not satisfactory, but it is also possible that curl sends speed limits.So then, inspired by you, I also tried the generator + http-output to generate data and http-input + stdout {codec => dots} consumption data ,It's far from what you've given。 Thank you very much for your reply. I look forward to your reply!

Ccxlp commented 6 years ago

@jsvd Download a 6.10 version of logstash and then copy your example for testing(Base configuration but Installing the X-ray pack): On the producer side: bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://localhost:8080' http_method => post } }" --path.data data2 On the consumer side running a couple of minutes: bin/logstash -e "input { http {port => 8080} } " > /dev/null View throughput through kibana:100 events/s , At the same time, some HTTP connection timeout errors are found in the logstash log: [2017-12-19T17:02:53,307][ERROR][logstash.outputs.http ] [HTTP Output Failure] Could not fetch URL {:url=>"http://localhost:8080", :method=>:post, :body=>"{\"@timestamp\":\"2017-12-19T09:02:22.640Z\",\"message\":\"Hello world!\",\"@version\":\"1\",\"host\":\"site50\",\"sequence\":3707}", :headers=>{"Content-Type"=>"application/json"}, :message=>"Read timed out", :class=>"Manticore::SocketTimeout", :backtrace=>nil, :will_retry=>true} I really don't know why this gap has been created. It's frustrating

jsvd commented 6 years ago

this is really weird indeed. so we can test the exactly same thing, can you install/use the pv tool like I did in my test?

producer:

bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://127.0.0.1:8080' http_method => post } }" --path.data data2

consumer:

bin/logstash -e "input { http {port => 8080} }" | pv -War > /dev/null

Ccxlp commented 6 years ago

@jsvd Use the pv tool (probably the version I use is different from you, and my pv doesn't have the 'a' parameter): producer: bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://127.0.0.1:8080' http_method => post } }" --path.data data2 consumer: bin/logstash -e "input { http {port => 8080} }" | pv -Wr > /dev/null Stable data:[64.8KB/s]

Ccxlp commented 6 years ago

@jsvd @ph @colinsurprenant @elasticdog @karmi During the test, I also found several phenomena:

  1. I used the 5.6 version before, when I tried to run the above example in version 5.6, the logstash started and generated the data for about two seconds before the OOM was reported: Error: Your application used more memory than the safety cap of 4G. Specify -J-Xmx####m to increase it (#### = cap size in MB). Specify -w for full OutOfMemoryError stack trace
  2. Increase the number of threads of logstash-http-input , increasing the proportion of processing speed.

Through try many times, I feel the reason of the slow speed in logstash - HTTP - input here, increase the Java client send speed (about 500 events/s) if the HTTP input using the default configuration, will return to 429, using the generator generated by input data, have been SocketTimeout thrown

I am very sorry for the use of @. I have been bothered by this question for many days. I hope to get your help and thanks again for the reply to JSVD!

jsvd commented 6 years ago

Please don't ping people directly, specially since I'm working on this with you.

After some testing I'm also seeing a very low throughput, my numbers before were mislead because I was not measuring the dots codec output. I am investigating this, I'll report back when I know more.

jsvd commented 6 years ago

@Ccxlp can you try starting the http input receiver with an ip instead of host and check the speed? So instead of: bin/logstash -e "input { http {port => 8080 } } output { stdout { codec => dots } }" | pv -Wr > /dev/null

Try: bin/logstash -e "input { http {host => '127.0.0.1' port => 8080 } } output { stdout { codec => dots } }" | pv -Wr > /dev/null

Ccxlp commented 6 years ago

@jsvd I just gave it a try: producer: bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://127.0.0.1:8080' http_method => post } }" --path.data data2 consumer: bin/logstash -e "input { http {host => '127.0.0.1' port => 8080 } } output { stdout { codec => dots } }" | pv -Wr > /dev/null [100B/s], What's the situation with you?

jsvd commented 6 years ago

I'm seeing slow speeds as well when testing with the logstash as an event producer. However, using a tool like ab or siege I see normal speeds:

siege --content-type "application/json" 'http://127.0.0.1:8080 POST {"ids": ["1","2","3"]}'          
** SIEGE 4.0.4
** Preparing 25 concurrent users for battle.
The server is now under siege...^C
Lifting the server siege...
Transactions:              89082 hits
Availability:             100.00 %
Elapsed time:               8.42 secs
Data transferred:           0.17 MB
Response time:              0.00 secs
Transaction rate:       10579.81 trans/sec
Throughput:             0.02 MB/sec
Concurrency:                3.78
Successful transactions:       89082
Failed transactions:               0
Longest transaction:            0.02
Shortest transaction:           0.00
ab -p data.json -n 100000 -T application/json -m POST -k "http://127.0.0.1:8080/"  
This is ApacheBench, Version 2.3 <$Revision: 1796539 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests

Server Software:        
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /
Document Length:        2 bytes

Concurrency Level:      1
Time taken for tests:   11.720 seconds
Complete requests:      100000
Failed requests:        0
Keep-Alive requests:    100000
Total transferred:      9000000 bytes
Total body sent:        17400000
HTML transferred:       200000 bytes
Requests per second:    8532.58 [#/sec] (mean)
Time per request:       0.117 [ms] (mean)
Time per request:       0.117 [ms] (mean, across all concurrent requests)
Transfer rate:          749.93 [Kbytes/sec] received
                        1449.87 kb/s sent
                        2199.80 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.1      0       8
Waiting:        0    0   0.1      0       8
Total:          0    0   0.1      0       8

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      0
 100%      8 (longest request)

siege test: 10579.81 trans/sec ab test: 8532.58 [#/sec] (mean)

So around 8.5-10k eps

Ccxlp commented 6 years ago

@jsvd Thank you for your answer.However, for logsage-http-input and Java httpclient, the interaction is so slow that the processing of a single message requires 200ms, and by grabbing the software (fidder),You can see that log-http-input is slow to respond, what do you suggest? fidder Caught HTTP packet: GotRequestHeaders: 20:44:57.472 ClientDoneRequest: 20:44:57.472 Determine Gateway: 0ms DNS Lookup: 0ms TCP/IP Connect: 0ms HTTPS Handshake: 0ms ServerConnected: 20:44:56.272 FiddlerBeginRequest: 20:44:57.472 ServerGotRequest: 20:44:57.472 ServerBeginResponse: 20:44:57.475 GotResponseHeaders: 20:44:57.475 ServerDoneResponse: 20:44:57.675 ClientBeginResponse: 20:44:57.675 ClientDoneResponse: 20:44:57.675 Overall Elapsed: 00:00:00.2030116

Ccxlp commented 6 years ago

@jsvd Do you have other operations in the process of building the logstash system? I used another virtual machine, download the 6.10 version, install jdk1.8.0 _74, no configuration java_home, but in logstash.lib.sh file specifies the JAVACMD, then keep the default configuration, use ab tools to implement your operation process, but my speed is still 25 b/s Am I missing something?

varun0002 commented 6 years ago

@Ccxlp jsvd: Were you able to find any solution for this?

troymevosus commented 6 years ago

I'm also experiencing this issue. Any updates?

jsvd commented 6 years ago

Can you try version 3.1.0 which replaces puma with netty?

pkoretic commented 6 years ago

@jsvd 3.1.0 has issues with both ab and siege ab will stall after first few requests and siege will use up all the ports and fail, no issues like that with 3.0.10, any idea?

jsvd commented 6 years ago

interesting, seems to be an issue with how ab does http 1.0 what is the problem with siege?

jsvd commented 6 years ago

I tried siege locally and it worked:

% siege  http://localhost:8888
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...^C
Lifting the server siege...
Transactions:              73917 hits
Availability:             100.00 %
Elapsed time:              16.64 secs
Data transferred:           0.14 MB
Response time:              0.00 secs
Transaction rate:        4442.13 trans/sec
Throughput:             0.01 MB/sec
Concurrency:                9.86
Successful transactions:       73917
Failed transactions:               0
Longest transaction:            0.27
Shortest transaction:           0.00
pkoretic commented 6 years ago

I guess 16 seconds is not enough, try it with --time=2M with new plugin and old and a new one will fail, transaction rate is also lower in my testing keepalive or not: Cannot assign requested address

bunch of connections in TIME_WAIT state

jsvd commented 6 years ago

my results with 3.1.0:

% siege  http://localhost:8888 --time=2M
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:            1297582 hits
Availability:             100.00 %
Elapsed time:             119.91 secs
Data transferred:           2.47 MB
Response time:              0.00 secs
Transaction rate:       10821.30 trans/sec
Throughput:             0.02 MB/sec
Concurrency:                9.69
Successful transactions:     1297582
Failed transactions:               0
Longest transaction:            0.19
Shortest transaction:           0.00

and with version 3.0.10:

siege  http://localhost:8889 --time=2M
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:            1054378 hits
Availability:             100.00 %
Elapsed time:             119.70 secs
Data transferred:           2.04 MB
Response time:              0.00 secs
Transaction rate:        8808.50 trans/sec
Throughput:             0.02 MB/sec
Concurrency:                4.76
Successful transactions:     1053448
Failed transactions:               0
Longest transaction:            0.28
Shortest transaction:           0.00
jsvd commented 6 years ago

This is strange indeed, a curl works correctly, siege works too, but ab fails:

% curl --http1.0 -v localhost:8888/ 
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8888 (#0)
> GET / HTTP/1.0
> Host: localhost:8888
> User-Agent: curl/7.54.0
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Content-Length: 2
< Content-Type: text/plain
< 
* Closing connection 0
ok
% ab -n 100 -c 1 -s 5 http://127.0.0.1:8888/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)...apr_pollset_poll: The timeout specified has expired (70007)

% siege  http://localhost:8888 --time=2M
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:            1297582 hits
Availability:             100.00 %
Elapsed time:             119.91 secs
Data transferred:           2.47 MB
Response time:              0.00 secs
Transaction rate:       10821.30 trans/sec
Throughput:             0.02 MB/sec
Concurrency:                9.69
Successful transactions:     1297582
Failed transactions:               0
Longest transaction:            0.19
Shortest transaction:           0.00
pkoretic commented 6 years ago

Just some information as I was investigating this plugin for a number of weeks now and the performance is lower than expected, even with a new 3.1.0 version. I'm not sure what machine you have but even from your results this is also not exceptional. Just having this logstash http input and elastic output with no filters I was not able to get more than 10k trans/sec per second when playing with memory, batch size and worker count on my machine (i7 7700, 8GB RAM, Intel SSD) and logstash was using all the cpu it can get.

Replacing http input with generator I was able to get 2-3x more documents and by benchmarking elastic using rally I was seeing the same, that elastic can process much more documents than this plugin can provide. Testing the same by using TCP input this is again confirmed.

In the end we have continued by writing custom Node.js HTTP and then HTTP/2 service using bulk api from elastic and throughput got up 3-4x times on the same system with the same identical data set which is also slightly processing data on the fly (parsing, modification, stringification).

I looked into implementing HTTP/2 support for this plugin as netty does support it but this is beyond my scope and expertise and I'm not sure how much of a boost it would provide here.

jsvd commented 6 years ago

thanks for the feedback @pkoretic

Replacing http input with generator I was able to get 2-3x more documents

This is certainly expected, generator performs no networking calls, no waiting, it's just a tight loop allocating logstash events and pushing them to the queue.

by benchmarking elastic using rally I was seeing the same, that elastic can process much more documents than this plugin can provide.

I'm not sure I understand the comparison here. the http input doesn't provide events, it just accepts http requests and pushes them to the logstash queue to be processed by filters+outputs.

In the end we have continued by writing custom Node.js HTTP and then HTTP/2 service using bulk api from elastic and throughput got up 3-4x times on the same system with the same identical data set which is also slightly processing data on the fly (parsing, modification, stringification).

again, I don't think I understand the comparison, if you're writing data to elasticsearch then the http input is not necessary.

Is your architecture http endpoint -> processing -> elasticsearch? for this you have ingest node in elasticsearch.

Also, the benchmarks you're performing for this plugin are based on each request having a single event. You can send json document that is an array of objects, so the http input will produce one event per object. You can see the performance boost of bulk vs single item by doing running two logstash instances:

The one receiving data and showing event count per second:

bin/logstash -e "input { http {} } output { stdout { codec => dots } }" | pv -War > /dev/null

Another instance generating events as fast as possible and pushing entire batches of events in a single request:

bin/logstash -e "input { generator {} } output { http { url => 'http://localhost:8080' format => json_batch http_method => post } }"

In my 13" mbp from 2018 I see about 75-85k e/s

pkoretic commented 6 years ago

Sorry if I haven't explained this properly but we are not trying to do anything special. And we did tests with batches of events in one request which, as expected, performs much better because, again, this http input is very slow so multiple evens confirm and workaround that but that also pushes batching logic to all clients.

We were trying to use http input plugin because of the obvious reasons that we can reuse our logstash with http input being just one input provider and keeping all the filter and index logic which logstash provides. http input is, as expected, slower, but it should not be that slower which we easily confirmed by comparing generator, tcp, and http input with stdout and elastic outputs and then writing a Node.js service that mimics this plugin behavior.

On an m5.2xlarge aws instance we had trouble getting over 30k e/s with this plugin. And we use and develop a lot http services so we easily noticed this one performs poorly. I would have to recheck but I also remember graylog with http gelf input, which is somewhat equivalent to this, performing as expected.

Another issue was that this plugin returns busy if you use a number of clients that is higher than a number of threads (try with siege -c 10) which is not a proper reactor pattern and that should be solved by using netty and 3.1.0.

jsvd commented 6 years ago

Another issue was that this plugin returns busy if you use a number of clients that is higher than a number of threads (try with siege -c 10) which is not a proper reactor pattern and that should be solved by using netty and 3.1.0.

the previous < 3.1.0 implementation supports multiple clients above the number of threads, but it responds immediately with 429 if those threads are busy. The netty implementation adds a queue to buffer bursts of traffic and micro delays.

but it should not be that slower which we easily confirmed by comparing generator, tcp, and http input with stdout and elastic outputs and then writing a Node.js service that mimics this plugin behavior.

building a service that performs the same task as the http input will always be much faster as logstash is a general purpose event pipeline with a lot of built in flexibility at the cost of performance.

Can you talk a bit more about the load you're putting on http input? is it high volume per connection, or many connections ? or both?

As for this plugin being slow, after the netty rewrite and pr #85 gets merged, we'll need java based codecs to be faster. That said, the plugin is at this point very light in what it does, it's a standard http netty processing pipeline, requests are read from the socket, transformed into HttpRequest objects, the content is decoded using the ruby codecs (the costliest part) and then pushed to the logstash queue.

Another option here is to support http pipelining, we can use https://github.com/elastic/elasticsearch/pull/8299 as inspiration.

pkoretic commented 6 years ago

It's mostly a lot of connections but there are spikes when there is also a big volume of events per connection, an event is generally under 100 characters. We have tried plain codec with filter to extract data and that gave some improvement but not as much as we would hoped, and I tried msgpack as well but that was abandoned for other reasons. I haven't tried plain codec with 3.1.0 though.

In the end I personally think http/2 would also be a good direction given it has proper multiplexing support and binary data transfer and that would replace pipelining and compressed requests efforts for http/1.x. It should be noticeably faster and lighter both for data ingestion and processing speed and in our service the difference between http/1.x and http2 is more than visible. But I do understand this could be a big task to implement given elastic itself doesn't support it yet, although netty is also used, so there is a reason for that I guess elastic/elasticsearch#10981. Thanks for all the work on the plugin!

jsvd commented 6 years ago

Thank you all for the great feedback. Since this issue has now diverged from the initial topic and the internal implementation has changed I'm going to close this. If you see a performance regression or any odd behaviour please open a new issue.