function main(splash)
assert(splash:go(splash.args.url))
return splash:url()
end
visit browser url: http://localhost:8998/redirect-hash#something you will see that hash is in url in input box.This confirms that test server is running and that your browser keeps url with hash even after redirect.
I found some workaround for this (luckily I can avoid redirects by using https url and sending some extra cookie) but I'm opening issue for future reference in case someone else stumbles on this. I see in QWebPage we have extension method and there is extensive comment there about redirects here, I'm not sure if it matters or how it matters here, but maybe that's some path for investigation.
Today I stumbled on one bug that results from somewhat unusual behavior when redirecting to urls containing "#" hash sign.
I have an url http://groceries.asda.com/asda-webstore/landing/home.shtml#search/ibuprofen/1/relevance_desc , when you request this url without any cookies site responds with redirect to: https://groceries.asda.com/asda-webstore/landing/home.shtml (same url but over https, note that you need to view traffic with mitm proxy as dev tools seem to hide redirects to first url). After redirect my desktop browser (Chrome/51.0.2704.84) keeps hash part of url. Splash seems to discard hash part after redirect. This means that site will not be rendered properly because hash part is missing.
I added tests on branch, see here: https://github.com/scrapinghub/splash/commit/cdae7c490210b83a41031decae6d8312566bee57 to reproduce you need to simply:
create following lua
visit browser url: http://localhost:8998/redirect-hash#something you will see that hash is in url in input box.This confirms that test server is running and that your browser keeps url with hash even after redirect.
Now check splash:
http://localhost:8050/execute?lua_source=function+main%28splash%29%0A++++assert%28splash%3Ago%28splash.args.url%29%29%0A++++return+splash%3Aurl%28%29%0Aend&url=http%3A%2F%2Flocalhost%3A8998%2Fredirect-hash%23something-bad
will return http://localhost:8998/redirect-hash⏎ instaed of http://localhost:8998/redirect-hash#something-bad.
I found some workaround for this (luckily I can avoid redirects by using https url and sending some extra cookie) but I'm opening issue for future reference in case someone else stumbles on this. I see in QWebPage we have extension method and there is extensive comment there about redirects here, I'm not sure if it matters or how it matters here, but maybe that's some path for investigation.