maaaaz / webscreenshot

A simple script to screenshot a list of websites
GNU Lesser General Public License v3.0
653 stars 162 forks source link

Using tool with external python script on a Jupyter Notebook #36

Closed rukasuri closed 4 years ago

rukasuri commented 4 years ago

Good day, I have installed webscreenshot. I tried running it in python, but i came into this error

[+] 2 URLs to be screenshot

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\z0014071\webscreenshot.py", line 361, in craft_cmd
    output_format = options.format if options.renderer == 'phantomjs' else 'png'
AttributeError: 'Namespace' object has no attribute 'format'
"""

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
<ipython-input-18-74b6b93cdc5a> in <module>()
      9 
     10 # actually launching the function
---> 11 take_screenshot(url_list, options)

~\webscreenshot.py in take_screenshot(url_list, options)
    437     pool = multiprocessing.Pool(processes=int(options.workers), initializer=init_worker)
    438 
--> 439     taken_screenshots = [r for r in pool.imap(func=craft_cmd, iterable=izip(url_list, itertools.repeat(options)))]
    440 
    441     screenshots_error_url = [url for retval, url in taken_screenshots if retval == SHELL_EXECUTION_ERROR]

~\webscreenshot.py in <listcomp>(.0)
    437     pool = multiprocessing.Pool(processes=int(options.workers), initializer=init_worker)
    438 
--> 439     taken_screenshots = [r for r in pool.imap(func=craft_cmd, iterable=izip(url_list, itertools.repeat(options)))]
    440 
    441     screenshots_error_url = [url for retval, url in taken_screenshots if retval == SHELL_EXECUTION_ERROR]

C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py in next(self, timeout)
    746         if success:
    747             return value
--> 748         raise value
    749 
    750     __next__ = next                    # XXX

AttributeError: 'Namespace' object has no attribute 'format'

here is my Code

import argparse
from webscreenshot import *

# url list to screenshot
url_list = ['http://google.de', 'http://google.com']

# defining options manually
options = argparse.Namespace(URL=None, cookie=None, header=None, http_password=None, http_username=None, input_file=None, log_level='DEBUG', multiprotocol=False, no_xserver=False, output_directory='/tmp/screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, renderer='phantomjs', renderer_binary=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

# actually launching the function
take_screenshot(url_list, options)

taken from https://github.com/maaaaz/webscreenshot/issues/19#issuecomment-511097306

maaaaz commented 4 years ago

Hello @rukasuri, Good catch, I hadn't updated the documentation after making some changes in webscreenshot. Take a look at this, it should work.

Cheers.

rukasuri commented 4 years ago

Thank you, now it seems to work, as the result Show as this

[+] 2 URLs to be screenshot
[+] 2 actual URLs screenshot
[+] 0 error(s)

but i dont know where the file is stored. I am using jupyter Notebook, is there any way of Opening the Debugger to see where the my file is stored (written in python Code)?

I have already the latest Version of phantomjs

rukasuri commented 4 years ago

I did manage to use -v in the anaconda prompt, which yield

[INFO][General] 'google.fr' has been formatted as 'http://google.fr:80' with supplied overriding options                                                                    
[+] 1 URLs to be screenshot                                                                                                                                                         
[ERROR][http://google.fr:80] renderer binary could not have been found in your current PATH environment variable, exiting                                                           
[+] 1 actual URLs screenshot                                                                                                                                                        
[+] 0 error(s)  

but i cant use

$ which phantomjs

from https://github.com/maaaaz/webscreenshot/issues/35#issuecomment-552353199 to see if my Phantomjs Version is giving the problem. Because i am not using a Linux device. Let's say if my Phantomjs path is giving a problem, is there any documentation showing how i should install the path of Phantomjs properly?

PS: it would be nice if we could still debug from the python script itself

maaaaz commented 4 years ago

In the code detailed here, the specified path is /tmp/screenshots (output_directory) and you change it:

options = argparse.Namespace(URL=None, cookie=None, format='png', header=None, http_password=None, http_username=None, input_file=None, log_level='DEBUG', multiprotocol=False, no_xserver=False, output_directory='/tmp/screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, quality=75, renderer='phantomjs', renderer_binary=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

For the other trouble, see this

rukasuri commented 4 years ago

The Problem lie on me using Jupyter Notebook and i do not have direct Access to System PATH environment.

All I did was copy the phantomjs.exe file into my Jupyter Notebook(Anaconda) root folder, and it works fine.

Thank you very much for your help.