oesmith / puffing-billy

A rewriting web proxy for testing interactions between your browser and external sites. Works with ruby + rspec.
MIT License
662 stars 168 forks source link

Weird performance issue when stubbing https requests #347

Open g-pavlik opened 6 months ago

g-pavlik commented 6 months ago

Follow up after: https://github.com/oesmith/puffing-billy/issues/346 (thanks for your time there, BTW)

I've been digging and trying to understand why puffing-billy is at the same time super fast (when testing via apache benchmark) and kind of sluggish when observind requests in chrome's network console, and I've found something interesting which I'm not sure is related.

Here's my standalone billy server:

# frozen_string_literal: true

require "billy"
require "os"

Billy.configure do |c|
  # Copied from our specs
  c.cache = true
  c.non_successful_cache_disabled = true
  c.persist_cache = true
  c.record_requests = true # defaults to false
  c.cache_path = "spec/puffing-billy/req_cache/"
  c.logger = Logger.new($stdout)
  c.logger.level = Logger::ERROR
  # NOTE: in case this is not used, we experience billy working intermittently (both locally and in CI).
  c.non_whitelisted_requests_disabled = true

  # set port for easier testing
  c.proxy_port = 44443
  # un-whitelist localhost
  c.whitelist = []
end

proxy = Billy::Proxy.new
proxy.start
puts "proxy started"

# proxy.stub(/.*/, {status: 200, headers: {}, body: "foo"})
proxy
  # .stub(/^https:\/\/cdnjs\.cloudflare\.com(:443)?\/ajax\/libs\/flag-icon-css\/[0-9.]+\/flags\/4x3\/[a-z][a-z]\.svg/, method: "get")
  .stub(/.*/)
  .and_return(
    headers: {"Content-Type" => "image/svg+xml"},
    body: File.read("spec/fixtures/files/flag/pl.svg"),
  )
proxy
  .stub(/^https?:\/\/cdnjs\.cloudflare\.com(:443)?\/ajax\/libs\/flag-icon-css\/[0-9.]+\/flags\/4x3\/[a-z][a-z]\.svg/, {method: "get"}).and_return(
  # headers: ALLOW_CORS_HEADERS.dup.merge("Content-Type" => "image/svg+xml"),
  body: "" # file_fixture("flag/pl.svg").read,
)

# Add extra stubs (stubs are LIFO)
20.times do |i|
  proxy.stub(/http:\/\/#{i}\.com/).and_return(headers: {}, body: "")
end

puts "proxy stubbed, sleeping Ctrl-c to stop"

loop do
  sleep(5)
end

And first, the blazingly fast:

ab -S -n 200 -c 1 -X localhost:44443 http://foo.com/ | grep -E "^(Time|Requests per second)"
Completed 100 requests
Completed 200 requests
Finished 200 requests
Time taken for tests:   0.049 seconds
Requests per second:    4084.63 [#/sec] (mean)
Time per request:       0.245 [ms] (mean)
Time per request:       0.245 [ms] (mean, across all concurrent requests)

(finished 200 requests in 50ms - :speedboat:)

Here's the problematic one (change only http to https)

ab -S -n 200 -c 1 -X localhost:44443 https://foo.com/ 
# cut
Benchmarking foo.com [through localhost:44443] (be patient)
apr_pollset_poll: The timeout specified has expired (70007) 
# Never finished

My standalone server logs:

E, [2024-06-03T12:18:21.095051 #1028097] ERROR -- : HTTP::Parser::Error (Could not parse data entirely (0 != 204)):
E, [2024-06-03T12:18:21.095158 #1028097] ERROR -- : /home/toptal/Work/oyster/anywhere/vendor/nix/gems/ruby/3.2.0/gems/puffing-billy-4.0.0/lib/billy/proxy_connection.rb:20:in `<<'
.../gems/ruby/3.2.0/gems/puffing-billy-4.0.0/lib/billy/proxy_connection.rb:20:in `receive_data'
.../nix/gems/ruby/3.2.0/gems/eventmachine-1.2.7/lib/eventmachine.rb:195:in `run_machine'
.../nix/gems/ruby/3.2.0/gems/eventmachine-1.2.7/lib/eventmachine.rb:195:in `run'
.../nix/gems/ruby/3.2.0/gems/puffing-billy-4.0.0/lib/billy/proxy.rb:70:in `main_loop'
.../nix/gems/ruby/3.2.0/gems/puffing-billy-4.0.0/lib/billy/proxy.rb:19:in `block in start'

Side note: this made me think the error causes some kind of infinite loop, and that the problem is SSL. I did add the recommended certs (https://github.com/oesmith/puffing-billy?tab=readme-ov-file#google-chrome-headless-example) but it did not make it any faster. I think now the parsing error and the performance issues might be unrelated. I'm not sure how to determine that besides fixing the parsing issue first.

When I add puts before puffing-billy-4.0.0/lib/billy/proxy_connection.rb:20

    def receive_data(data)
      puts "DATA:", data, "\n\n"
      puts data.inspect
      puts "===="
    end

Here's what I see for http request:

DATA:
GET http://foo.com/ HTTP/1.0
Host: foo.com
User-Agent: ApacheBench/2.3
Accept: */*

====
"GET http://foo.com/ HTTP/1.0\r\nHost: foo.com\r\nUser-Agent: ApacheBench/2.3\r\nAccept: */*\r\n\r\n"
====

And here's the output for https request:

"GET http://foo.com/ HTTP/1.0\r\nHost: foo.com\r\nUser-Agent: ApacheBench/2.3\r\nAccept: */*\r\n\r\n"
====
DATA:
3d>ii4ȯ]'=8,0̨̩̪+/$(k#'g
9       3=<5/b

foo.com

*(

====
"\x16\x03\x01\x00\xC7\x01\x00\x00\xC3\x03\x03\x9C\x9F3\x15d>i\x92\x98i\xF04\x0E\xC8\xAF@\xF3\b\xF5\xF0]\xAF\x92'\xEF\x84\x89\x1E=\x9C\xA6\xA8\x00\x008\xC0,\xC00\x00\x9F\xCC\xA9\xCC\xA8\xCC\xAA\xC0+\xC0/\x00\x9E\xC0$\xC0(\x00k\xC0#\xC0'\x00g\xC0\n\xC0\x14\x009\xC0\t\xC0\x13\x003\x00\x9D\x00\x9C\x00=\x00<\x005\x00/\x00\xFF\x01\x00\x00b\x00\x00\x00\f\x00\n\x00\x00\afoo.com\x00\v\x00\x04\x03\x00\x01\x02\x00\n\x00\f\x00\n\x00\x1D\x00\x17\x00\x1E\x00\x19\x00\x18\x00#\x00\x00\x00\x16\x00\x00\x00\x17\x00\x00\x00\r\x00*\x00(\x04\x03\x05\x03\x06\x03\b\a\b\b\b\t\b\n\b\v\b\x04\b\x05\b\x06\x04\x01\x05\x01\x06\x01\x03\x03\x03\x01\x03\x02\x04\x02\x05\x02\x06\x02"
====

One can reproduce this error:

> HTTP::Parser.new << "\x16\x03\x01\x00\xC7\x01\x00\x00\xC3\x03\x03\x12V\x9B\xEB\x9F\x13\xE9\xBC\x04/_5\xCE\xA3\xA6R\\C5N\x0F\x81x\xA7\x15\x01M\b\xCA\f!\xAE\x00\x008\xC0,\xC00\x00\x9F\xCC\xA9\xCC\xA8\xCC\xAA\xC0+\xC0/\x00\x9E\xC0$\xC0(\x00k\xC0#\xC0'\x00g\xC0\n\xC0\x14\x009\xC0\t\xC0\x13\x003\x00\x9D\x00\x9C\x00=\x00<\x005\x00/\x00\xFF\x01\x00\x00b\x00\x00\x00\f\x00\n\x00\x00\afoo.com\x00\v\x00\x04\x03\x00\x01\x02\x00\n\x00\f\x00\n\x00\x1D\x00\x17\x00\x1E\x00\x19\x00\x18\x00#\x00\x00\x00\x16\x00\x00\x00\x17\x00\x00\x00\r\x00*\x00(\x04\x03\x05\x03\x06\x03\b\a\b\b\b\t\b\n\b\v\b\x04\b\x05\b\x06\x04\x01\x05\x01\x06\x01\x03\x03\x03\x01\x03\x02\x04\x02\x05\x02\x06\x02"

HTTP::Parser::Error: Could not parse data entirely (0 != 204)

But to be honest I'm stuck here, and I'd appreciate any suggestions.

Especially:

  1. How can I determine if the data that we attempt to parse is even correct, or if could be a bug in http_parser?
  2. Am I using ab correctly to test performance of puffing-billy? Maybe I'm missing other options?

cc/ @khamusa

ronwsmith commented 6 months ago

I haven't done any performance testing on the gem before, so these are interesting findings!

Could the issue be with your standalone server? You're not keeping https in the looping, and https is not in your DATA debug.

joeyjoejoejr commented 6 months ago

@pawlik I think this is an issue with ab.

Running this

time curl --parallel --insecure -x localhost:44443 https://foo.com/[0-200]

Against your scripts yields 20 requests per second. Which is a bit slow but not crazy.

image