scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.05k stars 509 forks source link

Unable to get Splash to find Adblock Plus List #474

Open jim256 opened 8 years ago

jim256 commented 8 years ago

Hi,

I’ve been following the docs on how to use Adblock Plus to speed up the rendering of the pages I’m hitting. I’m using it inside Docker (Docker Toolbox 1.11.1b) on Windows 10.

I’m unsure of what is meant by: To activate request filtering support start splash with --filters-path option: python -m splash.server --filters-path=/etc/splash/filters I start the splash server each time from within docker, as stated here: https://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash/

If I try to run the above command in PowerShell, I get: C:\Python27\python.exe: No module named splash

When I try to simply append the filters option to my docker command, I get:

$ docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash --disable-private-mode –filters-path=C:/Users/jared/OneDrive/Documents/Projects/AutoSearch/Cars 2016-06-25 03:58:59+0000 [-] Log opened. 2016-06-25 03:58:59.366010 [-] Splash version: 2.1 2016-06-25 03:58:59.367296 [-] Qt 5.5.1, PyQt 5.5.1, WebKit 538.1, sip 4.17, Twisted 16.1.1, Lua 5.2 2016-06-25 03:58:59.367550 [-] Python 3.4.3 (default, Oct 14 2015, 20:28:29) [GCC 4.8.4] 2016-06-25 03:58:59.367818 [-] Open files limit: 1048576 2016-06-25 03:58:59.367965 [-] Can't bump open files limit 2016-06-25 03:58:59.472978 [-] Xvfb is started: ['Xvfb', ':1', '-screen', '0', '1024x768x24'] 2016-06-25 03:58:59.530356 [-] Traceback (most recent call last): 2016-06-25 03:58:59.530946 [-] File "/app/bin/splash", line 4, in 2016-06-25 03:58:59.531670 [-] main() 2016-06-25 03:58:59.532374 [-] File "/app/splash/server.py", line 372, in main 2016-06-25 03:58:59.533003 [-] server_factory=server_factory, 2016-06-25 03:58:59.533353 [-] File "/app/splash/server.py", line 273, in default_splash_server 2016-06-25 03:58:59.534131 [-] allowed_schemes=allowed_schemes, 2016-06-25 03:58:59.534420 [-] File "/app/splash/network_manager.py", line 55, in init 2016-06-25 03:58:59.535163 [-] self.adblock_rules = AdblockRulesRegistry(filters_path, verbosity=verbosity) 2016-06-25 03:58:59.536427 [-] File "/app/splash/request_middleware.py", line 147, in init 2016-06-25 03:58:59.537104 [-] self._load(path) 2016-06-25 03:58:59.537451 [-] File "/app/splash/request_middleware.py", line 171, in _load 2016-06-25 03:58:59.538119 [-] for fname in os.listdir(path): 2016-06-25 03:58:59.538542 [-] FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/jared/OneDrive/Documents/Projects/AutoSearch/Cars'

I have a file named default.txt in the specified directory.

Should I be placing it in a directory inside the vm? Is that where it is looking? If so, would someone mind explaining how I'd do that?

Would someone mind letting me know how I’m supposed to active the use of Adblock filtering / what I’m doing wrong?

kmike commented 8 years ago

To use AdBlock filters in Docker you need to mount a folder with filters to a docker image. It looks like there is a bug in the docs: the link to a relevant section has a wrong title: if you follow 'Splash versions' there will be some information about how to do that (http://splash.readthedocs.io/en/stable/install.html#customizing-dockerized-splash). But there is a warning about volume monting problems in Windows; maybe they are already resolved in Docker or in Docker beta, I'm not sure.

jim256 commented 8 years ago

Thanks for your quick response Mike.

From what I understood, my command should then be something like: docker run -p 8050:8050 -v <C:\Users\jared\OneDrive\Documents\Projects\AutoSearch\Cars\SplashFilters>:/etc/splash/filters scrapinghub/splash --disable-private-mode --filters-path=/etc/splash/filters

Does that sound about right? Unfortunately, that's still giving me: bash: C:UsersjaredOneDriveDocumentsProjectsAutoSearchCarsSplashFilters>:/etc/splash/filters: No such file or directory

kmike commented 8 years ago

<parameter> is a common way to specify that a parameter is required, and it should be replaced with a real value ([parameter] means that parameter is optional). Also, in shells \ symbols are escaping symbols, so \jared is a \j symbol followed by ared chars; \ is not used as-is. To avoid escaping use single quotes around a parameter.

This gives the following command - could you try it?

docker run -p 8050:8050 -v 'C:\Users\jared\OneDrive\Documents\Projects\AutoSearch\Cars\SplashFilters':/etc/splash/filters scrapinghub/splash --disable-private-mode --filters-path=/etc/splash/filters

jim256 commented 8 years ago

I appreciate your pointing out what should probably be obvious.

I thought it might be something like that and tried escaping the \ by using \\ (C:\\users\\jared...), but got the same error that running the command in your post gave me, which was: C:\Program Files\Docker Toolbox\docker.exe: Error response from daemon: Invalid bind mount spec "C:\\Users\\jared\\OneDrive\\Documents\\Projects\\AutoSearch\\Cars\\SplashFilters:/etc/splash/filters": invalid mode: /etc/splash/filters.

jim256 commented 8 years ago

I just thought I'd post and say that I finally just got it by doing: docker run -p 8050:8050 -v /c/Users/jared/OneDrive/Documents/Projects/AutoSearch/Cars/SplashFilters:/etc/splash/filters scrapinghub/splash --disable-private-mode --filters-path=/etc/splash/filters

I got the idea from here: https://github.com/docker/docker/issues/12751