rajatomar788 / pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.
https://rajatomar788.github.io/pywebcopy/
Other
520 stars 105 forks source link

Cannot download a website if it has invalid SSL certificate #109

Closed fumbles closed 1 year ago

fumbles commented 1 year ago

I do not see a way to account for the invalid certificate.

"name": "SSLError",
    "message": "HTTPSConnectionPool(host='$host, port=443): Max retries exceeded with url: /tape/tapetec.nsf/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1129)')))",
    "stack"
rajatomar788 commented 1 year ago

You can do it as below

from pywebcopy.config import get_config
config = get_config(url, project_folder, project_name, bypass_robots, debug, delay, threaded)
webpage = config.create_page()

# here change the ssl verification settings 
webpage.session.verify = False
webpage.get(url)

webpage.save_complete(pop=open_in_browser)
fumbles commented 1 year ago

Thanks! I got it to work with crawler to traverse the whole site. I'm getting an error when encountering the 3 shockwave scripts that are in there but otherwise it's great....on my list to sort out.

LocationParseError: clsid:D27CDB6E-AE6D-11cf-96B8-444553540000

from pywebcopy.configs import get_config
from pywebcopy import warnings
import urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

url=url
project_folder=path
project_name=name
bypass_robots=True
debug=False
open_in_browser=True
delay=None
threaded=False

config = get_config(url, project_folder, project_name, bypass_robots, debug, delay, threaded)
crawler = config.create_crawler()
crawler.session.verify = False # for SSL verification
crawler.get(url)
if threaded:
    warnings.warn(
        "Opening in browser is not supported when threading is enabled!")
    open_in_browser = False
crawler.save_complete()
rajatomar788 commented 1 year ago

You should set debug=True to get the log of actual events happening to assess the error.