rajatomar788 / pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.
https://rajatomar788.github.io/pywebcopy/
Other
520 stars 105 forks source link

Testing current full web crawling functionality #114

Open BradKML opened 1 year ago

BradKML commented 1 year ago

Currently I am testing to see if PyWebCopy can download the whole website (subdomain) rather than merely a webpage. Unfortunately it did not work as intended. save_webpage and save_website should be different

import os # borrowed from https://stackoverflow.com/a/14125914
relative_path = r'book_test'
current_directory = os.getcwd()
final_directory = os.path.join(current_directory, relative_path)
if not os.path.exists(final_directory): os.makedirs(final_directory)

from pywebcopy import save_website
save_website(url='https://www.nateliason.com/notes', project_folder=final_directory, 
             project_name="test_site", 
             bypass_robots=True, debug=True, open_in_browser=False,
             delay=None, threaded=False,)

In the debug logs, none of the URL calls go beyond the main URL, it did not jump down the layers into other related URLs. What could be the cause of this?