scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.07k stars 513 forks source link

Rendering error on pages with redirects #47

Closed andresp99999 closed 10 years ago

andresp99999 commented 10 years ago

The following Splash calls return "Error rendering page":

The target page have in common that they redirect to another URL, not sure if that is the cause of the issue.

kmike commented 10 years ago

I haven't fixed this ticket yet, but some debugging shows that it is related to https://github.com/scrapinghub/splash/issues/33.

On these pages loadFinished signal is issued 2 times: first is with ok=False (page before the redirect), second with ok=True (page after the redirect). Interesting enough, both responses can be rendered. But because errback is called during handling of the first signal, second loadFinished (which is issued a bit later) is actually not issued and not handled; the first response is not rendered because of the errback.

I haven't yet figured out how to check if a successful loadFinished would be issued after "ok=False" loadFinished. Also, it could be a good idea to render the response even if loadFinished got ok=False, because there could be HTML available even if ok=False.

tamoyal commented 10 years ago

I'm getting a bunch of these errors if you guys want some help debugging/testing. I'm very new to splash so I probably won't have time to dig into the source and fix for weeks. Two examples below: http://tuttopronto.ca http://www.dondonizakaya.com

tamoyal commented 10 years ago

I also get a seg fault when I make the call twice: Segmentation fault: 11

kmike commented 10 years ago

@tamoyal thanks for the website examples!

I've seen segfaults when splash in run on mac os x and pages use custom fonts: https://github.com/scrapinghub/splash/issues/16 - is this the same issue? Are you on mac?

tamoyal commented 10 years ago

@kmike I am on OS X and I had a hunch there was also something OS X / QT related because I'm getting crashes about every 50 scrapes - seg faults and bus errors :( Do you know if there's any way to work around this?

I also think there are javascript redirect issues though. I could probably provide a bunch more examples ... http://www.rhythmspa.ca/ http://www.avlirestaurant.com

kmike commented 10 years ago

The only workaround for segfaults I know is to run splash server in a linux VM.