scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.04k stars 507 forks source link

migrate from qtwebkit to qtwebengine #349

Open pawelmhm opened 8 years ago

pawelmhm commented 8 years ago

According to http://blog.qt.io/blog/2014/12/10/qt-5-4-released/

Qt 5.4 also still contains the older Qt WebKit module. Qt WebKit is still supported, but as of Qt 5.4 we consider it done, so no new functionality will be added to it. We are also planning to deprecate Qt WebKit in future releases, as the new Qt WebEngine provides what is needed. In most use cases, migrating from Qt WebKit to Qt WebEngine is rather straightforward. If you are starting a new project that requires web capabilities, we advise that you already start using Qt WebEngine.

so we should start working on migration to QtWebEngine?

kmike commented 8 years ago

Yes, we should. I believe not all features will be available though. It is improved in 5.5, but still many features are missing. It could make sense to support both engines because they have their advantages - QWebKit allows a deeper customization, QWebEngine is more modern.

pawelmhm commented 8 years ago

one thing that is a little worrying so far is this note:

unlike Qt WebKit, Qt WebEngine has its own HTTP implementation and cannot go through a QNetworkAccessManager. The signals and methods of QNetworkAccessManager that are still supported were moved to the QWebEnginePage class.

source

this is pretty bad for us, we do lots of things in NAT, and relation between web page and NAT is very important for us.

On the other hand release notes for newest version, which is qt 5.6 say:

We’ve also added a new Qt WebEngineCore module for new low-level APIs. This includes features such as(..) intercepting and blocking network requests and for tracking and blocking cookies.

source

so maybe future direction will allow for easier integration with NAT? But I wonder if this is going to offer similar capabilities to what we have now.

kmike commented 8 years ago

But I wonder if this is going to offer similar capabilities to what we have now.

Likely not, or at least not soon. Well, I think most things will be supported, but not all. That's why I think we should support both engines, not just migrate.

trickysam44 commented 7 years ago

Hi @kmike, sorry for being so blunt, but is there any progress on adding support for QWebEngine? I'ts just that I really love splash, but have a need to use some chrome extensions inside them, and it seems QWebEngine allows that. I could possibly do some hacks, modify them and inject directly to the loaded pages using LUA, but it won't last extension updates and such.

How much work is needed to supply this option? I might try to implement this, just not sure in which state this is currently.

kmike commented 7 years ago

Hey @trickysam44,

No progress so far, nobody worked on it yet :(

Porting to QWebEngine is likely much more work than updating to a more recent WebKit (https://github.com/scrapinghub/splash/issues/541). I think to do that we need to provide another BrowserTab object, based on QWebEngine, and allow to choose between them. We tried to keep WebKit-specific parts in BrowserTab, but it is likely there are some WebKit APIs "leaking" to other parts of Splash; this should be fixed as well.

Some APIs won't be available (i.e. it is not possible to implement all Splash features using QWebEngine), and some APIs could change (operation which was not async could become async), but that should be fine.

trickysam44 commented 7 years ago

Thanks for the prompt response @kmike! I think I'll try to research the QWebEngine porting, and see if I could help with that, because just updating WebKit won't help me, as from what I understand I can't utilize it to load extensions.

kmike commented 7 years ago

Thanks @trickysam44!

If you'll start researching it, questions or updates are very welcome!

Even if research turns out to be unsuccessfull, it is great to know what to look at, and what are the roadblocks.

vedantrathore commented 7 years ago

@kmike I would love to get started on this, I was reading this but are there any python bindings for this?

kmike commented 7 years ago

Hey @vedantrathore,

Splash is implemented using PyQT; PyQT has QtWebEngine bindings.

vedantrathore commented 7 years ago

@kmike Okay, I got it. So how do you think I should start tackling this issue?

kmike commented 7 years ago

@vedantrathore I think the way to go is to implement another BrowserTab object which has the same API as existing BrowserTab object, but uses QtWebEngine internally. Providing the same API won't be possible for many reasons, but having e.g. render.html and render.png endpoints working with this alternative object (without supporting many of render.html/png options) could be a big step forward.

I think that it makes sense to do some basics first - figure out how to run tests, figure out what happens when user calls /render.html endpoint (which classes and functions get called and why), what happens in case of /execute endpoint, etc. - get familiar with source code and project structure.

If I'd be starting with this project I'd create a super-basic version of this new WebEngineBrowserTab, then figure out which parts of it can be shared with the existing BrowserTab, then figure out how to run existing tests against this new WebEngineBrowserTab, and then start fixing tests and changing APIs if necessary.

Note that we shoudn't drop QtWebKit support, QtWebEngine should be an addition, not a replacement.

vedantrathore commented 7 years ago

@kmike Thanks! I'll get onto it

leobuskin commented 7 years ago

@vedantrathore @kmike hey, I've just checked Splash w/ PyQt 5.8 [SIP 4.19.1] + Qt 5.8 + QtWebKit TP5 for Qt 5.8 (from https://github.com/annulen/webkit) and Ubuntu 16.04.

9 failed, 1173 passed, 12 skipped, 26 xfailed, 2 pytest-warnings - w/o patching Splash, just a lot of package modifications and some workarounds and I'm pretty sure these 9 failures were mine, heh.

So, I suggest two-step upgrade:

  1. Splash switches from Qt 5.5 to Qt 5.8 + QtWebKit by @annulen (looks like stable repo and good maintainer) - we will get last versions of WebKit w/ new features and would keep community interest;
  2. Add BrowserTab-based support for Chromium engine (it could be QtWebEngine or smth CEF-based).

Looks good, if you want to adopt my solution of first step - what will be the best way to contribute?

kmike commented 7 years ago

@leobuskin sounds awesome!

I'm fine with switching to Qt 5.8 and QtWebKit fork and dropping "official" qtwebkit from 5.5.1. To do it we need to switch Splash Docker image to Qt 5.8 and QtWebKit fork (and Ubuntu 16.04) and make it run on Travis. See https://github.com/scrapinghub/splash/blob/master/Dockerfile, https://github.com/scrapinghub/splash/blob/master/dockerfiles/splash/provision.sh and https://github.com/scrapinghub/splash/blob/master/.travis.yml.

If you can help with that (make a pull request with these changes) it'd be perfect. All tests don't have to pass at this stage; having Docker and Travis setup in place would be a huge step forward.

leobuskin commented 7 years ago

@kmike good news, let's do it.

Do you think it'll be a good idea to install qt58webkit in /opt like other qt packages? qt58webkit depends on few qt58* packages from ppa:beineri/opt-qt58-xenial, is it ok to make such type of dependency, from one ppa to another?

I have an ugly method right now that needs to be replaced asap: provision.sh downloads QtWebKit release from repo and unpack it into /opt/qt58, heh.

kmike commented 7 years ago

Do you think it'll be a good idea to install qt58webkit in /opt like other qt packages? qt58webkit depends on few qt58* packages from ppa:beineri/opt-qt58-xenial, is it ok to make such type of dependency, from one ppa to another?

I don't have a strong opinion here; it sounds fine.

I have an ugly method right now that needs to be replaced asap: provision.sh downloads QtWebKit release from repo and unpack it into /opt/qt58, heh.

How do you want to solve it?

Glennvd commented 7 years ago

Any updates on this? @leobuskin could you commit your progress so far somewhere? I'd be interested in iterating further on this.

shi-cong commented 6 years ago

image

iorlas commented 3 years ago

Well, guys, we are unable to use the Slash at all due to the issue of the absence of the QtWebKit package. Basically, it is impossible to install it w/o quirks. If it is not a critical issue, I don't know what is...