Open phiresky opened 2 years ago
using --profile=... with a profile manually created by installing the extension doesn't seem to work either, not sure why. What worked for me was doing
docker run -e CHROME_FLAGS="--disable-extensions-except=/crawls/ublock --load-extension=/crawls/ublock"
and putting the extracted extension in /crawls/ublock
Yes, I think its a good idea, but probably should figure out a way to have it be installed by default, w/o requiring a custom profile, perhaps via --load-extension
? Do you have time to work on a PR by any chance? Would be greatly appreciated!
Maybe a more high-level flag, like --enable-ad-block
might be fine to start..
Do you have an idea why loading a profile that has an extension installed doesn't work? Maybe I did something wrong, but without explicitly specifying the extension with disable-extensions-except+load-extension it seemed to ignore it. Probably should figure that out before being able to implement extension loading in code... What I did:
chromium --user-data-dir tmpdir
# install the extension manually from the store
cd tmpdir && tar cf ublock-profile.tar *
docker run ....... --profile /.../ublock-profile.tar
I scraped this URL by the way to test whether or not uBlock was installed: https://blockads.fivefilters.org/
I'll create a PR that at least adds documentation for the environment flag(s).
There is a bit of a weird behavior indeed when installing a browser extension during manual profile creation: the browser never shows the extension as installed:
https://user-images.githubusercontent.com/571494/161566516-0ecd7a43-c536-4f91-b610-5adcc9945e73.mp4
The containerized Chromium doesn't seem to have any extensions enabled, at least according to chrome://extensions/
:
Do you have an idea why loading a profile that has an extension installed doesn't work? Maybe I did something wrong, but without explicitly specifying the extension with disable-extensions-except+load-extension it seemed to ignore it.
Did you ever figure out why this was?
Having extensions enabled on a profile and actually have the crawler use them would be pretty significant, especially for ad block. I appreciate having the DNS adblock list integrated, but that doesn't block a bunch of stuff that something like uBlock would be able to do.
Adding an ad-blocker seems to make crawling much easier. On two sites I've tested, without an adblocker the number of requests is an order of magnitude higher than with it, and on one of the two sites it doesn't even know when the page load is done because it keeps loading more ads.
I guess it's possible to do this manually by creating a profile ourselves but it's pretty cumbersome (doesn't work with the interactive profile creation tool either).
I'm thinking of something like this in crawl-config.yaml: