Open kmike opened 9 years ago
:+1:
I'm running an AngularJS based website and want to programatically take screenshot on it. And since the website is using local storage to store token for authentication. It'll helps a lot if Splash can set different key-value pairs in local storage for each browsing session.Thanks a lot.
I tried to see if there was a way to enable it without disabling private browsing, here is what I found (Everything only tested on qt4 branch):
Changing the quotas via QWebSettings::setOfflineStorageDefaultQuota()
, enabling via QWebSettings::enablePersistentStorage()
or settings a per-origin rule via QWebSecurityOrigin::setDatabaseQuota()
doesn't work. The code that checks if it can be used in private mode is here:
https://github.com/qtproject/qtwebkit/blob/8b00fdada15a53c7764472435cffe04f22c3522f/Source/WebCore/storage/Storage.cpp#L159
Following the calls in that function you can see that it could be enabled for any protocol by calling SchemeRegistry::registerURLSchemeAsAllowingDatabaseAccessInPrivateBrowsing
webkit function, unfortunately qt doesn't offer an API to call it.
Trying to overwrite the localStorage
property of window
either using a injected script or QWebFrame::addToJavaScriptWindowObject
fails silently. I can't think of simple way of enabling localStorage without disabling private browsing.
Today I tried again to overwrite the localStorage in the window and succeeded, by using __defineGetter__
.
I wrote a localStorage shim that works in splash: https://gist.github.com/Youwotma/17d9b05fddd5ee4d9aa5
The data is not persisted anywhere, but it can be done, simply add a line in the update()
function to save the data somewhere:
document.body.setAttribute('data-local-storage', JSON.stringify(storage));
Then extract the data from the returned HTML, save it somehow and load it again in a new webpage:
splash:runjs([[
var storageData = (<put the extracted json here somehow>);
for(var k in storageData){
localStorage[k] = storageData[k];
}
]])
EDIT: This way to override the localStorage object only works in Splah Qt4.
Wonder if it would be interesting to have a repository or folder to keep useful splash scripts.
@Youwotma it'd be nice to have some support for https://luarocks.org/
Why is private browsing enabled in the first place? just curious, I didn't know about this
@AlexIzydorczyk I think it is enabled to prevent cookies, history, etc. from leaking between requests. I don't know how useful is it, and we should disable it to enable localStorage. To do that we should make sure nothing leaks without localStorage (better with --slots=1
).
@kmike , thanks, makes sense. I actually have a use case for this, so I'll try disable private browsing and enabling local storage and see how it goes.
This ticket is important for https://github.com/scrapinghub/splash/pull/288 because it is not possible to override the window.localStorage
object in qt5. So there is a workaround in qt4, but not in qt5.
My proposal to fix this:
The local storage path can be configured in a per-QWebPage basis, but the offlineStoragePath, and offlineWebApplicationCachePath and IconDatabasePath are global.
By using /tmp/ or a tmpfs we make sure that the files are cleaned up when the container/computer restarts even if splash crashes.
I'm not sure how does qtwebkit behave if two different tabs have different persistent storage path, since in a normal browser different tabs need to share data between them (but we should prevent this in splash). I'm going to make some tests to see if data is shared between tabs and update here.
Some of the data that we need to prevent from leaking between tabs:
Some tests by just disabling private mode and enabling localStorage:
Mode | JS Cookies | LocalStorage | SessionStorage | History |
---|---|---|---|---|
Private mode on, simultaneous sessions | Leaks | :ok: | :ok: | :ok: |
Private mode on, non simultaneous sessions | :ok: | :ok: | :ok: | :ok: |
Private mode off, simultaneous sessions | Leaks | Leaks | :ok: | Leaks[1] |
Private mode off, non simultaneous sessions | Leaks | Leaks | :ok: | Leaks[1] |
[1] - Leaks and it will make a difference when rendering (:visited styles applied to links), but it's not readable from javascript.
@Youwotma great analysis :+1: to clarify: did you check it with qt 5 or with qt4?
I checked with qt4
what is the status of this after update to QT5? I need to handle webpage that requires local storage and actually breaks without local storage enabled
There is now the --disable-private-mode
flag which will enable local storage, but the local storage data and other data like cookies and browser history will be kept and shared between different splash requests.
thanks @Youwotma Does it make sense to add option enabling local_storage per request or maybe per netloc? Something like splash:enable_local_storage() @kmike @Youwotma
I'm worried about disabling private mode for all requests going via Splash. I only need local storage for one website, others dont need it and are ok with private_browsing.
can we close that @kmike ? with recent splash master local storage works ok if you disable private_mode
Guys, I experienced this very same problem. I was trying to render pages in private mode but they failed because they relied on the HTML5 Local Storage. I fixed it by forcing the storage to be enabled, that is, I searched for every line where you disable it according to the settings and forced it to be enable.
https://github.com/scrapinghub/splash/search?utf8=%E2%9C%93&q=LocalStorageEnabled
Set to: settings.setAttribute(QWebSettings.LocalStorageEnabled, True)
Then for every request I perform I have to send a JavaScript like this. js_source = "window.localStorage.clear();"
Sending the js_source is not a problem at all, but I have to patch all my instances of splash. Could you add an option to force LocalStorageEnabled to True even in private mode?
@javierfvargas there is flag private_mode_enabled
that should do what you need - splash.private_mode_enabled = false will enable local storage
@pawelmhm, yes I know about the flag but as stated by the documentation "if you disable private mode then browsing data such as cookies or items kept in localStorage may persist between requests" and I don't want such behaviour but still I want the Local Storage to be able so that the page can be rendered.
An example of such page would be this one https://www.pcuonline2.org/pawtucketcredituniononline_40/uux.aspx#/login
Thanks.
FWIW, I just encountered this problem and was able to work around it by creating a Javascript profile that includes the Modernizr localStorage shim.
@danielnaab which shim did you use? The Mordernizr site lists several different shims
@kennethkalmer Not sure what you mean - I only see one. Try this link and click "build": https://modernizr.com/download?localstorage-setclasses&q=local
It probably shouldn't matter, but I'm using version 3.3.1.
@danielnaab Modernizr only provides the test, then you need to decide which polyfill to include if Modernizr.localstorage === false
. On the link you sent, when selecting "Local Storage", on the right it has a list of 4 polyfills. I've tried the main one and a ton of variations on different gists with no luck.
What I'm testing now is something I stumbled on in 6b1033a7840 in the tests for splash.private_mode_enabled
:
function main(splash)
splash.private_mode_enabled = false
assert(splash.private_mode_enabled == false)
assert(splash:go(splash.args.url))
assert(splash:wait(splash.args.wait))
html = splash:html()
splash.private_mode_enabled = true
return html
end
It is "Good Enough", will need to do more testing to see if things are leaky though.
I was thinking it is fixed by upgrading to a more recent webkit, but it needs to be re-checked.
I still have problems with different sessions sharing local storage, this makes impossible to scrape SPA sites concurrently. EDIT: I was running my environment with aquarium, we were able to fix it by setting one slot per splash instance. I don't know how slots work but it seems that they share resources somehow.
We're trying to enable it by using
but this doesn't work because earlier we enable private mode:
and when QWebKit is in private mode localStorage is disabled.