scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.09k stars 513 forks source link

GZIP'd requests to /render.json not allowed #1009

Closed cartermckinnon closed 3 years ago

cartermckinnon commented 4 years ago

Requests to /render.json which are gzip-compressed fail with:

{
  "error": 400,
  "type": "BadOption",
  "description": "Incorrect HTTP API arguments",
  "info": {
    "type": "invalid_json",
    "description": "Can't decode JSON",
    "message": "'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte"
  }
}

Reproduce with:

echo '{"url": "http://google.com"}' \
    | gzip \
    | curl --data-binary @- \
      -H 'content-encoding: gzip' \
      -H 'content-type: application/json' \
      http://localhost:8050/render.json

GZIP compression is enabled by default in the HTTP client I'm using (java's jersey), so the lack of support from Splash is a pain.

I'm happy to PR this if a better pythoner isn't available?

vladiscripts commented 4 years ago

GZIP request to 'render.html' and 'execute' don't return error message, but return broken response.text, like <html><head></head><body>��k��6�(��U�?��x��`

To bypass need change 'Accept-Encoding': 'gzip' to: 'Accept-Encoding': 'deflate' As I found, this problem was in 2016 https://stackoverflow.com/a/38693397/6357045, https://github.com/scrapinghub/splash/issues/423, and seems in 2014 https://github.com/scrapinghub/splash/pull/102.