qutebrowser / qutebrowser

A keyboard-driven, vim-like browser based on Python and Qt.
https://www.qutebrowser.org/
GNU General Public License v3.0
9.72k stars 1.01k forks source link

Support URL patterns for more settings (per-domain settings) #3636

Open The-Compiler opened 6 years ago

The-Compiler commented 6 years ago

Split off from #27 - settings which would be useful per-domain:


Some more musings about what code this is going to affect how, what's possible to set, and what kind of URLs should be affected, prompted by this IRC conversation:

2017-10-10 16:29:18     jkamat  also, re: domain matching. If I style 'a.com' and it has an iframe or external resources linking to b.com, will those get the per-domain settings of a.com (since that's the original page)
2017-10-10 16:31:22     hola1_  no
2017-10-10 16:31:28     hola1_  cause it's not the same domain
2017-10-10 16:33:21     jkamat  that could be annoying, imo. I guess it's possible to go through and add rules for the iframe domains as well...
2017-10-10 16:42:21     hola1_  that's the intension behind per-_domain_ settings... every domain which is called, get's it's own settings no matter if in an iframe or as page (except it's done like in uMatrix where in a.com containing b.com, b.com can have other settings for b.com than c.com containing b.com)
2017-10-10 16:44:10     jkamat  I think that from a high level (ignoring requests and all) per-domain settings make it possible to "go to something.com" and have "settings apply on that website", which wouldn't be the case if something.com was implemented with a ton of iframes or similar
2017-10-10 16:51:45     The-Compiler    jkamat: so many unsolved questions...
2017-10-10 16:52:12     The-Compiler    but yeah, I also tend to think that it wouldn't affect frames from a different origin
2017-10-10 16:53:19     The-Compiler    let's say you have stylesheets for blog.example.com, and that has an iframe from youtube.com or whatever - I don't think the styling rules for blog.example.com should be applied to YouTube
2017-10-10 16:56:25     gilbertw1       I agree with that, iframes generally should be treated as a completely separate entity (which they are)
2017-10-10 17:05:29     jkamat  hmm, I guess that's true. I'm more worried about externally hosted javascript as well (suppose a script linked from a different domain exists, it won't run on a site with javascript disabled, right?)
2017-10-10 17:09:08     gilbertw1       You mean that if you block all javascript on any pages from Domain X using per-domain settings, you don't want it loading and running scripts from Domain Y when you access a page on Domain X
2017-10-10 17:09:39     jkamat  yes, that's a clearer explanation of what I want :P
2017-10-10 17:11:07     gilbertw1       Hah, that's a good point that I don't know the answer to. It's definitely a valid use case I'd assume many would want
2017-10-10 17:13:08     hola1_  if i understand you right, you explaining the same what i've tryed and what uMatrix ( https://github.com/gorhill/uMatrix ) does

We basically have three kind of URLs:

User stylesheets

See #3854

QtWebKit

We can use QWebSettings::setUserStyleSheetUrl based on the top-level URL.

QtWebEngine

Since we use JavaScript to apply the stylesheet, it's probably easy to get the frame URL, and I think it's better to use that for the stylesheet. In other words, if you're on www.example.com and that has an iframe for www.youtube.com, the stylesheet for example.com doesn't apply in the YouTube frame.

We could also format the top-level URL into JavaScript from Python (or get it in JS directly? Probably not?) - but I think we don't need it at all.

:heavy_check_mark: Cookies

QtWebKit

In QNetworkCookieJar::insertCookie we should be able get the resource URL (?) via QNetworkCookie::domain. I'm not sure whether we can get the top-level URL in any way...

QtWebEngine

There's no cookie filter API yet (WIP Qt 5.11 change). We could probably watch QWebEngineCookieStore::cookieAdded and call deleteCookie right away.

Looks like there's no way to get the top-level URL here either?

:heavy_check_mark: User agent

QtWebKit

Override QWebEnginePage::userAgentForUrl. We can probably get both the top-level URL (from the page) and the request URL.

QtWebEngine

We have the user agent set via QWebEngineProfile::setHttpUserAgent and that's global. However, at least for headers (but not window.navigator.userAgent) we can use QWebEngineUrlRequestInterceptor to return an user agent based on the top-level and request URL.

:heavy_check_mark: Other headers

QtWebKit

We can set those in createRequest based on the resource URL.

QtWebEngine

We can set those with a QWebEngineUrlRequestInterceptor based on the resource and top-level URL.

Requests

QtWebKit

With QNetworkAccessManager::createRequest we can get the QNetworkRequest. However, we can't really get much information from there (e.g. the type of the request).

QtWebEngine

With a QWebEngineUrlRequestInterceptor we can get some infos about requests, allowing for something uMatrix-like. See #2626, #28.

Bindings

Haven't looked into that yet, but the code handling the bindings can probably just look at the current URL. This one should be backend independent.

In addition to adding new bindings, it should be possible to selectively unbind bindings for a page.

:heavy_check_mark: Permissions

QtWebKit

QWebPage::featurePermissionRequested gives us the QWebFrame (and thus frame URL), we can get the top-level URL from the page.

QtWebEngine

QWebEnginePage::featurePermissionRequested gives us the securityOrigin (frame URL?) and we can get the top-level URL from the page.

Proxy

QtWebKit

We can probably just use the URL we get in QNetworkProxyFactory::queryProxy.

QtWebEngine

For some reason, the URL we get in QNetworkProxyFactory is always empty - so we might need to find some other way to do this... See #4577, #2492.

:heavy_check_mark: SSL strict

Other

Some other settings which could apply per-domain based on the current URL, but where it's questionable whether there's an use-case:

Waples commented 6 years ago

Would like to see per-domain 'auto accept permissions for audio/video', for example Slack Calls from a trusted domain.

Don't know how I can help, but if I can, tell me ^^

The-Compiler commented 6 years ago

@Waples Already mentioned as "Permissions" above (also, yay, my name is Florian too :wink:)

Waples commented 6 years ago

Yeah saw it just now, sorry for the bump. (also, small world cause my last name also starts with a B :') )

m-j-r commented 6 years ago

I like the idea of downloads.location.* being site specific - allows me to not have to fiddle around when e.g. saving bank statements into 'statement folder' and then remembering to change it back to downloads next time.

ghost commented 6 years ago

I'd like per-domain fonts. I prefer my built in fonts for most websites, but if I disable remote fonts globally a few sites I use a lot are missing icons.

Ambrevar commented 6 years ago

If I understand this correctly, Qutebrowser 1.2.1 does not allow for per-domain stylesheets yet?

The-Compiler commented 6 years ago

@Ambrevar Correct.

lambdadog commented 6 years ago

I think it'd be worth doing what uMatrix does about cookie filtering and create the cookies but just wall them off so they can't be accessed unless cookies are enabled for the domain. This means if you accidentally login to something without enabling cookies, you can just enable cookies and you won't need to login again because of a simple mistake like that.

lambdadog commented 6 years ago

I'm not familiar with QtWebEngine, but is it possible to intercept and edit requests sent by it? That would allow you to just "wipe" the cookies it's sending with the request. You'd need to inject some javascript to block getting them via javascript too I guess, but that would probably be a "more elegant" solution than trying to fight QtWebEngine for it

The-Compiler commented 6 years ago

I'm not familiar with QtWebEngine, but is it possible to intercept and edit requests sent by it?

Yes, but deleting cookies that way doesn't work.

but that would probably be a "more elegant" solution than trying to fight QtWebEngine for it

Since the cookie filtering API landed in Qt 5.11, using that is probably the most elegant solution :wink:

ddevault commented 6 years ago

it's questionable whether there's an use-case

Popping in to say I want to set per-site zoom settings.

jgkamat commented 6 years ago

@toofar already has a PR for that, #3782, although I haven't had time to review or test it yet. I'd find it useful too :)

noctuid commented 6 years ago

I think there is definitely a use case for turning off adblocking per domain (either because you want to support some site or because the site is broken with adblocking). Being able to change the download location directory would be extremely useful as well.

sid-code commented 6 years ago

I'd be willing to help contribute to per-domain bindings but I'm not sure where to start with that. My initial reaction to seeing per-domain settings in was to try the following:

with config.pattern("*://example.org/*") as p:
    p.bindings.commands['r'] = 'nop'

But that didn't work. Is the plan is to extend support to this? I'm willing to help in any way I can.

jgkamat commented 6 years ago

I'd be willing to help contribute to per-domain bindings but I'm not sure where to start with that.

Yes, bindings aren't supported yet. If you are willing to look into it, I'd take a look at some of the other per-domain PR's (https://github.com/qutebrowser/qutebrowser/pull/3854, https://github.com/qutebrowser/qutebrowser/pull/3782), which should provide some high level examples. Bindings will probably be a little more tricky than the other settings, as it would involve understanding how bidnings are processed (I personally haven't touched that yet). Once you understand though, it shouldn't be too difficult to add it.

The-Compiler commented 6 years ago

FWIW I picked up some low-hanging fruit for v1.4.0:

amerlyq commented 6 years ago

I'm voting for the "Proxy" settings in the next usecase:

Edit: however, on second thought, adding per-domain proxy is to complicated in comparison to possible gains -- because my case above is solved successfully by independent instances of qutebrowser with different --basedir. Therefore "proxy" per domain has sense only when you need three or more proxies at the same time (isn't that too rare to support?).

merspieler commented 6 years ago

@amerlyq you can add a per domain proxies for .onion (tor network) and for .i2p (and other "tld"s) (i2p network) as well if you use such services in addition to your cases

toofar commented 6 years ago

@The-Compiler in the section talking about cookies you say you would want to get "the top-level URL". What do you mean by that and why you do you want it? Why couldn't we just do config.instance.get('content.cookies.accept', url=request.firstPartyUrl)?

The-Compiler commented 6 years ago

@toofar I meant the first-party URL with "top-level URL". Back when I wrote this, the cookie filtering API wasn't really a thing yet. Nowadays I think it's fine to just not support this before Qt 5.11 (because it'd be a hack), so we do have access to that information.

The question is whether to use firstPartyUrl or origin there.

The-Compiler commented 6 years ago

@amerlyq Using multiple basedirs is kind of cumbersome, so I'd still like to have support for per-domain proxies. It's probably not very difficult either - in queryProxy in qutebrowser/browser/network/proxy.py we already get the URL as part of query, so that'd just need to be passed to the config system.

toofar commented 6 years ago

I think using firstPartyUrl is closer to what the setting means currently and is similar to what #4046 proposes. Using the origin URL would have different semantics (apply to these resources rather than apply when on this page). But after thinking about it for a bit using the origin URL would be more useful in every case and I think it would still be a one line change.

Also, for webkit QNetworkCookieJar.setCookiesFromUrl() appears to get passed the orign URL.

The-Compiler commented 6 years ago

@Yodzorah content.javascript.can_access_clipboard already supports per-domain settings. It also works fine with QtWebEngine. What you describe seems to have nothing to do with the clipboard at all, I don't really follow.

The-Compiler commented 6 years ago

FWIW websites can not override qutebrowser bindings (they don't even get to see keys bound in qutebrowser). If you're talking about hjkl on DuckDuckGo, those simulate cursor keypresses. You can rebind them to :scroll-px instead, but then scrolling will break on other pages. #3691 will hopefully help.

mschilli87 commented 6 years ago

@Yodzorah: As far as I understood, 'cursor keypress' means 'pressing cursor keys'. Hope this helps.

The-Compiler commented 6 years ago

@Yodzorah But that's not the case. hjkl scroll by sending cursor keypresses to the website (like I've explained above), and the website can handle those.

The-Compiler commented 6 years ago

I'm now hiding all the off-topic comments here. If you want to continue this discussion, let's please do so in the IRC channel.

The-Compiler commented 6 years ago

No, the issue would be invalid, because it's based on the incorrect assumption that websites can get qutebrowser keypresses. I don't know how else to explain it. Like I've said earlier already, there's #3691 which will help by not doing scrolling by emulating keypresses.

As for using IRC, there's a webchat linked there.

The-Compiler commented 6 years ago

JS code can read what keys were pressed by the user.

Only for keys which aren't bound in qutebrowser, otherwise it filters those.

I assume, when qutebrowser imitates cursor keypresses (Arrows, right?), Duckduckgo reads them as cursor keypresses, but binds unusual behaviour to these keypresses. So, if one will open Duckduckgo from Chrome without any browser keybindings at all, and use standard Down Arrow, he will meet the same unusual and js-defined behaviour.

Correct (note that you can also turn that off in DuckDuckGo's settings).

I'm asking, if there is an option to implement blocking of reading keys from JS to disallow unusual behaviour globally instead of creating countless workarounds for every case with bad side effects.

It's only a problem with scrolling, because the way scrolling currently is implemented (by imitating keypresses) is a hack. There's no other cases. The PR I've linked (twice) already will solve that for scrolling as well.

olmokramer commented 5 years ago

It would also be nice if editor.command supported URL patterns. My use case would be to automatically set the editor's filetype to markdown on github.com and similar domains.

The-Compiler commented 5 years ago

@olmokramer I'd rather do that via the filename instead of different commands so it works with all editors - see #2727.

olmokramer commented 5 years ago

@The-Compiler Oh that's much better! Never mind then.

wernerb commented 5 years ago

My use-case is single page apps like slack,todoist,pocket.

I use my window manager to start specific pages at start on different workspaces. Editing the title-format per domain would enable me have the window manger filter based on it and add transparency, disable borders etc, start fullscreen etc..
Additionally I would like to be able to disable tabs and statusbar for the same aesthetic reasons.

The below is an example of treating slack.com as a separate app. If this were implemented new windows would get the correct old settings and be started on different workspacese by the window manager.

with config.pattern('https://*.slack.com/') as p:
       p.new_instance_open_target = 'window'
       p.window.title_format = '{perc}{title}{title_sep} - WORKSPACE9 - qutebrowser'
       p.tabs.show = 'never'
       p.statusbar.hide = True

Is this something that could be possible?

jgkamat commented 5 years ago

Werner Buck writes:

I use my window manager to start specific pages at start on different workspaces. Editing the title-format per domain would enable me to do lots of advanced workflows, just like hiding tabs and statusbar for specific pages. The use-case is single page apps like slack,todoist,pocket.

The below is an example of treating slack.com as a separate app.

with config.pattern('https://*.slack.com/') as p:
       p.new_instance_open_target = 'window'
       p.window.title_format = '{perc}{title}{title_sep} - WORKSPACE9 - qutebrowser'
       p.tabs.show = 'never'
       p.statusbar.hide = True

I don't think this is a good idea at the moment because the config system is still rather slow - that config lookup (for window.title_format) is currently holding back zooming speed and can possibly get a lot worse if it becomes per-domain. Same for things like statusbar.hide and tabs.show - we get called from qt a LOT for tab sizing, and cpython just isn't fast enough to handle doing that much work (without lagging a lot, like on previous versions). Tab size hints are still bottlenecking us in some cases even when we're doing very little every call.

https://github.com/qutebrowser/qutebrowser/issues/4628 https://github.com/qutebrowser/qutebrowser/issues/4409

I also don't think that's a good idea implementation wise, a website can (possibly) break your script by playing with the title. It seems like a hack to me.

It sounds like you should use multiple basedirs or WM_CLASS, although, the --qt-name argument seems to be gone now. Maybe there's an API for setting WM_CLASS or some other window-specific variable in qt now.

https://github.com/qutebrowser/qutebrowser/issues/514

ninewise commented 5 years ago

It sounds like you should use multiple basedirs or WM_CLASS, although, the --qt-name argument seems to be gone now. Maybe there's an API for setting WM_CLASS or some other window-specific variable in qt now.

514

--qt-arg name foo works now; happened to be discussed in IRC earlier. Bascially you create new configuration directories, and add some aliases of the form alias slackbrowser=qutebrowser --basedir some/config/diff --qt-arg name slackbrowser

Do note that these other browsers won't have your regular bookmarks synced (but you could copy those files).

wernerb commented 5 years ago

Usingn different basedirs already but did not know about qt-arg name! thanks that helps a lot!

One other thing for me is that even with basedir. If I open a new link, i'd actually want it to open the link using a different basedir/instance of qutebrowser. Is there a way to do this? Open link in external program maybe? :)

The-Compiler commented 5 years ago

@wernerb I agree per-domain settings are probably the wrong solution to this - also see #3012 and #2314. You can't tell qutebrowser to open links in a new instance (and I'm not sure whether it's possible to implement). All those things are quite off-topic for this issue, though! You might want to take a look at "Getting help" in the docs if you have more questions.

The-Compiler commented 4 years ago

FWIW per-domain content.cookies.accept is in now, via #4395.

Ambrevar commented 4 years ago

Fantastic!

erazemk commented 4 years ago

FWIW per-domain content.cookies.accept is in now, via #4395.

Will this be included in the next version of qutebrowser?

The-Compiler commented 4 years ago

It will be in v1.12.0.

dt098 commented 4 years ago

Was it implemented? Can't find it if so.

The-Compiler commented 4 years ago

@dt098 "it" as in per-domain support for content.cookies.accept? Yes, see the comment above yours.

mavaa commented 4 years ago

Being able to have a whitelist of domains where cookies are allowed is great, thank you a thousand times for this! :heart:

The-Compiler commented 4 years ago

@mavaa Let me forward those thanks to @lufte! :+1:

maxigit commented 4 years ago

I second downloads.location.directory as well input.insert_mode.auto_enter (so I can set it to false unless I am on duckduckgo (as home page).

maxigit commented 4 years ago

As it seems complicated to implement, would not be easier to have instead autocmd mechanism. Once a page is downloaded depending on the URL some commands are executed. The user can use that to set specific options or bindings. Ok I realized that it will only work if we can buffer specific bindings and settings ...

crater2150 commented 4 years ago

A use case for zoom.default patterns:

I recently stumbled upon several websites that have a minimum-width in CSS set in pixels, which in combination with highdpi and vertical tabs causes parts of the page on the right being cut off and requiring horizontal zoom (a prominent example is Amazon). It probably also occurs without highdpi if you don't use the full width of your screen. Some other pages switch to a mobile view, if the pixel width is too small.

Most of those pages work fine, when setting zoom to 90 or 80%, so saving that zoom value per url would be great.

arjan-s commented 4 years ago

It would be awesome to have this for content.cookies.store. My use case is to only keep cookies for some sites I trust, and clean up everything else after quitting the browser.

mavaa commented 4 years ago

@arjan-s I haven't tried cookies.store, but content.cookies.accept is supposed to be working per domain from v1.12.0 according to comments above.