rajatomar788 / pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.
https://rajatomar788.github.io/pywebcopy/
Other
520 stars 105 forks source link

"Blocked resource" error #118

Closed danyaljj closed 7 months ago

danyaljj commented 1 year ago

When I try the script for famed websites such as Google, I am getting error:

ERROR    - pywebcopy.schedulers.Scheduler:134 - Blocked resource on external domain: https://www.google.com/
ERROR    - pywebcopy.schedulers.Scheduler:157 - Discarding invalid resource: <WebPage: https://www.google.com/>

Is this expected? is there a way to bypass these?

Here is my script:

> python3 -m pywebcopy -p  --url https://google.com --location=/Users/danielk/PycharmProjects/ 
rajatomar788 commented 1 year ago

Hey, This issue can arise either if you are using different subdomains of the given website or if you don't have the scraping permission for the site.

So try the 'https://www.google.com/' instead of 'https://google.com' that you used. Also try the bypass_robots option equals True.