scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.07k stars 513 forks source link

If filterlist is blacklisting src URL of Iframe , the whole page fails rendering #699

Open Vineeth-Mohan opened 6 years ago

Vineeth-Mohan commented 6 years ago

We have been observing this when we blacklist certain ads like googletagmanager.com , this might come as src in iframe . Instead of avoiding the iframe , the entire page render fails giving error like below.

2017-11-14 09:39:39.389716 [events] {"path": "/render.html", "_id": 140043492314864, "method": "POST", "load": [0.64, 0.68, 0.63], "user-agent": "Claritybot", "args": {"wait": 10, "image": 0, "timeout": 90, "uid": 140043492314864, "resource_timeout": 60, "filters": "my_fileters", "url": "https://www.xyz.com", "headers": {"Cookie": "888Cookie=Srv=EB-04&OSR=486413&RefType=Unknown&Referrer=https://www.888poker.com/&orig-lp=https://www.888poker.com/poker/poker-odds-calculator; ASP.NET_SessionId=s4i5n0nf1xrkj2pkzawr3ehi; TS010cadcf=01681f908e80412eb20550515f2ac1986d9f0035194a6b90483cffbe03709ca247851a4f3a6a09415ffae3a684830773ac43c759717e8228f6a23e02533b3ccf8b1b5b02b12a0ca85b2d6633042afb1ed075d01069", "User-Agent": "ClarityBot", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Referer": "https://www.888poker.com/", "Accept-Language": "en"}}, "timestamp": 1510652379, "status_code": 502, "maxrss": 273588, "client_ip": "172.17.0.1", "qsize": 0, "fds": 73, "active": 1, "error": {"type": "RenderError", "error": 502, "info": {"url": "", "type": "Network", "text": "Protocol \"\" is unknown", "code": 301}, "description": "Error rendering page"}, "rendertime": 16.056299209594727}

I feel this is a serious issue and I cannot find a work around for this. Either we need provision to disable iframe or this error should not affect the rendering of the page.

Gallaecio commented 4 years ago

Did you ever find a workaround?