maaaaz / webscreenshot

A simple script to screenshot a list of websites
GNU Lesser General Public License v3.0
654 stars 162 forks source link

How to use webscreenshot from inside a python script? #19

Open own3mall opened 5 years ago

own3mall commented 5 years ago

The documentation states:

pip install webscreenshot and then directly use webscreenshot

How does one directly use webscreenshot?

My python script contains:

import webscreenshot

Now, how do I call webscreenshot directly from the script? The documentation doesn't provide any examples. It does for calling the script from the commandline and passing arguments, but I want to call it directly from inside my python script.

webscreenshot.take_screenshot(list_of_urls) doesn't seem to work.

maaaaz commented 5 years ago

Hello,

You indeed need to call that function. But before that you need a proper options variable with parameters specified inside: launch the tool with -vv option and you will see the structure of that variable here

Cheers.

maaaaz commented 5 years ago

Hello,

Here below a more precise answer:

import argparse
from webscreenshot.webscreenshot import *

# url list to screenshot
url_list = ['http://google.fr', 'http://google.com']

# defining options manually
options = argparse.Namespace(URL=None, cookie=None, header=None, http_password=None, http_username=None, input_file=None, log_level='DEBUG', multiprotocol=False, no_xserver=False, output_directory='/tmp/screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, renderer='phantomjs', renderer_binary=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

# actually launching the function
take_screenshot(url_list, options)

I admit that this use case deserves a better approach.

Cheers

maaaaz commented 4 years ago

For the reference, I maintain an updated version of the correct code in the FAQ

ss2sfcollege commented 4 years ago

I'm getting this error on using the above code snippet

`[+] 2 URLs to be screenshot multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.8/site-packages/webscreenshot/webscreenshot.py", line 421, in craft_cmd output_format = options.format if options.renderer == 'phantomjs' else 'png' AttributeError: 'Namespace' object has no attribute 'format' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/aditya/GIT/Web/test.py", line 11, in take_screenshot(url_list, options) File "/usr/lib/python3.8/site-packages/webscreenshot/webscreenshot.py", line 525, in take_screenshot taken_screenshots = [r for r in pool.imap(func=craft_cmd, iterable=izip(url_list, itertools.repeat(options)))] File "/usr/lib/python3.8/site-packages/webscreenshot/webscreenshot.py", line 525, in taken_screenshots = [r for r in pool.imap(func=craft_cmd, iterable=izip(url_list, itertools.repeat(options)))] File "/usr/lib/python3.8/multiprocessing/pool.py", line 865, in next raise value AttributeError: 'Namespace' object has no attribute 'format' `

maaaaz commented 4 years ago

@ss2sfcollege, have you followed indications. If yes, it's weird, as the format option is declared in the code sample.

poornasandeep commented 4 years ago

Hello,

I'm getting the following error if executed the above program:

C:\Users\sandeep\PycharmProjects\sparkflow_validation\venv\Scripts\python.exe C:/Users/sandeep/PycharmProjects/sparkflow_validation/take_screenshot.py [+] 2 URLs to be screenshot [+] 2 URLs to be screenshot [+] 2 URLs to be screenshot [+] 2 URLs to be screenshot [+] 2 URLs to be screenshot Traceback (most recent call last): File "", line 1, in File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 262, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 95, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\sandeep\PycharmProjects\sparkflow_validation\take_screenshot.py", line 11, in take_screenshot(url_list, options) File "C:\Users\sandeep\PycharmProjects\sparkflow_validation\venv\lib\site-packages\webscreenshot\webscreenshot.py", line 523, in take_screenshot pool = multiprocessing.Pool(processes=int(options.workers), initializer=init_worker) File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 212, in init self._repopulate_pool() File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 303, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static w.start() File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\context.py", line 326, in _Popen return Popen(process_obj) File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Users\sandeep\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

The same error is getting in a loop and the program is not terminating

maaaaz commented 4 years ago

@poornasandeep can you paste here the code you are using to call webscreenshot ?

YusufRoshdy commented 4 years ago

@maaaaz I am getting the same error as @poornasandeep. Here is the code I am using (taken from the FAQ):

import argparse
from webscreenshot.webscreenshot import *

url_list = ['http://google.com']

options = argparse.Namespace(URL=None, cookie=None, header=None, http_password=None, http_username=None, input_file=None, log_level='DEBUG', multiprotocol=False, no_xserver=False, output_directory='./screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, renderer='phantomjs', renderer_binary=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

take_screenshot(url_list, options)

It does not terminate and it keeps printing [+] 1 URLs to be screenshot forever.

I am using Python 3.8.3 on Windows 10 (2004 update), with version 2.92 of the webscreenshot package.

Here is the error stack ``` Traceback (most recent call last): File "", line 1, in File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Program Files\Python38\lib\runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Program Files\Python38\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Program Files\Python38\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "d:\upwork\Nikhil Parekh\SMTP\mail with html\utilities.py", line 11, in take_screenshot(url_list, options) File "C:\Users\yusuf\AppData\Roaming\Python\Python38\site-packages\webscreenshot\webscreenshot.py", line 535, in take_screenshot pool = multiprocessing.Pool(processes=int(options.workers), initializer=init_worker) File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 212, in __init__ self._repopulate_pool() File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 303, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static w.start() File "C:\Program Files\Python38\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Program Files\Python38\lib\multiprocessing\context.py", line 326, in _Popen return Popen(process_obj) File "C:\Program Files\Python38\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__ prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\Program Files\Python38\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. ```
maaaaz commented 4 years ago

Thanks for reporting, it seems related to the way Python 3.8 now behaves with multiprocessing.

I think that the pool creation (that line) should be moved to the main() function, as suggested on different cases

In the meantime, try to execute your code with Python 3.7 and not 3.8.

maaaaz commented 4 years ago

I confirm that bug, I tried to fix it but unfortunately failed so far in front of this madness.

I do understand the technical reasons, but I regret that users calling webscreenshot from alternate scripts will have to handle multiprocessing by themselves instead of webscreenshot doing it on its own.

Concept211 commented 3 years ago

An alternative would be to run it as a subprocess which seems to be working fine for me on Python 3.8:

import subprocess

subprocess.run('webscreenshot google.com --window-size 800,600')