Open marshonhuckleberry opened 4 years ago
What code are you using? I need to see the log file if you find it properly.
On Thu, Jan 23, 2020, 1:07 PM marshonhuckleberry notifications@github.com wrote:
works on some websites but in others it fails, i looked in issues for any solution for "permission error" found one i ignored robots.txt but it still gets permission error, but there is just a small difference with robots txt bypass it downloads 1 more page than before, no chance with this site " http://mathworld.wolfram.com/"
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rajatomar788/pywebcopy/issues/28?email_source=notifications&email_token=AIGSNTWJATI3AAJWIBNUD73Q7FCKZA5CNFSM4KKSAFWKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IIFGHJQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIGSNTUC7HVRRCKJQUT5WVDQ7FCKZANCNFSM4KKSAFWA .
the code:
import pywebcopy import requests from pywebcopy import save_webpage
pywebcopy.SESSION.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36' kwargs = {'project_name': 'new'}
save_webpage( url='http://mathworld.wolfram.com/topics/', project_folder='path', bypass_robots=True, debug=True, **kwargs )
the log file:
pywebcopy_log.log
Try setting up the user-agent in the pywebcopy.config
so that it changes it across the project.
import pywebcopy
pywebcopy.config['http_headers']['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'
pywebcopy.config.setup_config("http://mathworld.wolfram.com/", "path", project_name="new", bypass_robots=True)
pywebcopy.save_webpage("http://mathworld.wolfram.com/", "path")
error!
works on some websites but in others it fails, i looked in issues for any solution for "permission error" found one i ignored robots.txt but it still gets permission error, but there is just a small difference with robots txt bypass it downloads 1 more page than before, no chance with this site "http://mathworld.wolfram.com/"