scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.08k stars 515 forks source link

A website traffic analytics script will leak origin User-Agent #428

Open chinaquant opened 8 years ago

chinaquant commented 8 years ago

This problem can reproduce by uncommit '--splash:wait(1)'

The leak User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) server.py Safari/538.1

splash:set_user_agent("NewUserAgent")
splash:go("http://ibenz.org/bd/stat.htm")
--splash:wait(1)

If uncommit '--splash:wait(1)', the script will trigger 3 requests. The requests start like this: 'http://hm.baidu.com/hm.gif?xxx' The third request headers 'User-Agent' leak out the origin user-agent.

The third request seems should be triggered by leaving or close the currently website. But I don't know why splash:wait(1) can cause to the same effect.

kmike commented 8 years ago

What Splash version are you using?

chinaquant commented 8 years ago

Hi @kmike

I'm using Splash version: 2.0.3 Does no this problem in your Splash?

Splash.server and Splash HTTP UI can reproduce this problem. Splash-Jupyter no this problem.

Last request leak the User-Agent, Seems occurred when splash:wait(1) resume.

leak_user_agent