Closed joshuasmall closed 4 years ago
Sorry for the delay. I'll check
Hi @9seconds
Could it be because of the proxy api?
https://github.com/scrapinghub/crawlera-headless-proxy#proxy-api
Is there a way to set the port that it listens on?
Thanks!
@joshuasmall I've checked, everything works as intended. Basically, this application has 2 different ports: one for proxy api (which you access with your http client or browser) and another one for stats api (where you can see some json with stats)
$ ./crawlera-headless-proxy -a "$CRAWLERA_MYAUTH" -d -p 8888 -w 9999
DEBU[0000] TLS checksums. ca-cert=dcb3f998f0991455ef93583d4d6d49d71fa91e80 priv-key=a556e5cdb31e233ed4331610196c62010b9fa7cb
DEBU[0000] Listen on 127.0.0.1:8888 adblock-lists="[]" apikey=*** bindip=127.0.0.1 bindport=8888 concurrent-connections=0 crawlera-host=proxy.crawlera.com crawlera-port=8010 debug=true direct-access-hostpath-regexps="[]" dont-verify-crawlera-cert=false no-auto-sessions=false proxy-api-ip=127.0.0.1 proxy-api-port=9999 xheaders="map[]"
...
$ curl -x localhost:8888 -ik https://httpbin.org/headers
HTTP/1.1 200 OK
Date: Wed, 01 Apr 2020 06:12:29 GMT
Content-Length: 0
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Wed, 01 Apr 2020 06:12:33 GMT
Content-Type: application/json
Content-Length: 532
X-Crawlera-Session: 1414469652
X-Crawlera-Version: 1.43.0-a83f3a
access-control-allow-origin: *
access-control-allow-credentials: true
x-powered-by: Flask
x-processed-time: 0
x-upstream: httpbin-master_web
Proxy-Connection: close
Connection: close
{
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Forwarded": "for=*******;proto=https",
"Host": "httpbin.org",
"Proxy-Authorization": "Basic *****",
"Referer": "https://httpbin.org/headers",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:74.0) Gecko/20100101 Firefox/74.0"
}
}
$ curl localhost:9999/stats
{
"requests_number": 4,
"crawlera_requests": 5,
"sessions_created": 1,
"clients_connected": 0,
"adblocked_requests": 0,
"crawlera_errors": 2,
"all_errors": 4,
"overall_times": {
"average": 3.8498335285,
"minimal": 2.428097765,
"maximal": 4.965745319,
"median": 4.002745515,
"standard_deviation": 0.9689998911970745,
"percentiles": {}
},
"crawlera_times": {
"average": 3.0794386718,
"minimal": 0.99316086,
"maximal": 4.96556832,
"median": 3.532206065,
"standard_deviation": 1.5978717257630468,
"percentiles": {}
},
"uptime": 299
}
So, -p
goes to proxy api port, -w
- to stats api port
I close this ticket. Please feel free to reopen if you have any questions
And again, I'm really sorry for the delay in answering your questions 🙇
Hi @9seconds
Thank you for checking this for me.
Version 1.2.1 crawlera-headless-proxy-linux-amd64 OS: Debian 4.9.168-1+deb9u4
When setting the bind_port in any way (command line flags, environment vars or through a config file) it is not respected.
It is always set to 3129.
EDIT: This seems to occur when running more than one instance of crawlera-headless-proxy.