zytedata / zyte-smartproxy-headless-proxy

A complimentary proxy to help to use SPM with headless browsers
MIT License
109 stars 36 forks source link

bind_port is not being respected #19

Closed joshuasmall closed 4 years ago

joshuasmall commented 4 years ago

Version 1.2.1 crawlera-headless-proxy-linux-amd64 OS: Debian 4.9.168-1+deb9u4

When setting the bind_port in any way (command line flags, environment vars or through a config file) it is not respected.

It is always set to 3129.

EDIT: This seems to occur when running more than one instance of crawlera-headless-proxy.

9seconds commented 4 years ago

Sorry for the delay. I'll check

joshuasmall commented 4 years ago

Hi @9seconds

Could it be because of the proxy api?

https://github.com/scrapinghub/crawlera-headless-proxy#proxy-api

Is there a way to set the port that it listens on?

Thanks!

9seconds commented 4 years ago

@joshuasmall I've checked, everything works as intended. Basically, this application has 2 different ports: one for proxy api (which you access with your http client or browser) and another one for stats api (where you can see some json with stats)

$ ./crawlera-headless-proxy -a "$CRAWLERA_MYAUTH" -d -p 8888 -w 9999
DEBU[0000] TLS checksums.                                ca-cert=dcb3f998f0991455ef93583d4d6d49d71fa91e80 priv-key=a556e5cdb31e233ed4331610196c62010b9fa7cb
DEBU[0000] Listen on 127.0.0.1:8888                      adblock-lists="[]" apikey=*** bindip=127.0.0.1 bindport=8888 concurrent-connections=0 crawlera-host=proxy.crawlera.com crawlera-port=8010 debug=true direct-access-hostpath-regexps="[]" dont-verify-crawlera-cert=false no-auto-sessions=false proxy-api-ip=127.0.0.1 proxy-api-port=9999 xheaders="map[]"
...
$ curl -x localhost:8888 -ik https://httpbin.org/headers
HTTP/1.1 200 OK
Date: Wed, 01 Apr 2020 06:12:29 GMT
Content-Length: 0

HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Wed, 01 Apr 2020 06:12:33 GMT
Content-Type: application/json
Content-Length: 532
X-Crawlera-Session: 1414469652
X-Crawlera-Version: 1.43.0-a83f3a
access-control-allow-origin: *
access-control-allow-credentials: true
x-powered-by: Flask
x-processed-time: 0
x-upstream: httpbin-master_web
Proxy-Connection: close
Connection: close

{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Forwarded": "for=*******;proto=https",
    "Host": "httpbin.org",
    "Proxy-Authorization": "Basic *****",
    "Referer": "https://httpbin.org/headers",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:74.0) Gecko/20100101 Firefox/74.0"
  }
}
$ curl localhost:9999/stats
{
  "requests_number": 4,
  "crawlera_requests": 5,
  "sessions_created": 1,
  "clients_connected": 0,
  "adblocked_requests": 0,
  "crawlera_errors": 2,
  "all_errors": 4,
  "overall_times": {
    "average": 3.8498335285,
    "minimal": 2.428097765,
    "maximal": 4.965745319,
    "median": 4.002745515,
    "standard_deviation": 0.9689998911970745,
    "percentiles": {}
  },
  "crawlera_times": {
    "average": 3.0794386718,
    "minimal": 0.99316086,
    "maximal": 4.96556832,
    "median": 3.532206065,
    "standard_deviation": 1.5978717257630468,
    "percentiles": {}
  },
  "uptime": 299
}

So, -p goes to proxy api port, -w - to stats api port

I close this ticket. Please feel free to reopen if you have any questions

And again, I'm really sorry for the delay in answering your questions 🙇

joshuasmall commented 4 years ago

Hi @9seconds

Thank you for checking this for me.