rajatomar788 / pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.
https://rajatomar788.github.io/pywebcopy/
Other
520 stars 105 forks source link

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 8: ordinal not in range(128) #106

Open Artucuno opened 1 year ago

Artucuno commented 1 year ago

Using the example gives me this error

Traceback (most recent call last):
  File "C:\Users\user\Desktop\icc\download.py", line 2, in <module>
    save_website(
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\__init__.py", line 164, in save_website
    crawler.save_complete(pop=open_in_browser)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\core.py", line 218, in save_complete
    self.scheduler.handle_resource(self)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\schedulers.py", line 156, in handle_resource
    return self._handle_resource(resource)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\schedulers.py", line 191, in _handle_resource
    resource.retrieve()
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 368, in retrieve
    return self._retrieve()
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 456, in _retrieve
    context = self.extract_children(self.parse())
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 439, in extract_children
    self.scheduler.handle_resource(ans)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\schedulers.py", line 156, in handle_resource
    return self._handle_resource(resource)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\schedulers.py", line 191, in _handle_resource
    resource.retrieve()
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 368, in retrieve
    return self._retrieve()
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 612, in _retrieve
    self.extract_children(self.parse()),
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 591, in extract_children
    source = re.sub(
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\re.py", line 210, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pywebcopy\elements.py", line 560, in repl
    url, _ = unquote_match(match.group(1).decode(encoding), match.start(1))
rajatomar788 commented 1 year ago

Please attach - the version you are using, code you are using and whether the error is occurring only on this particular site for you or with all the other sites.

ckhordiasma commented 1 year ago

I have the same issue, not sure how many sites it doesn't work with, but here is some example code

from pywebcopy import save_webpage

kwargs = {'project_name': 'test'}

save_webpage(

    # url of the website
    url='https://www.wix.com',

    # folder where the copy will be saved
    project_folder='./temp',
    **kwargs
)
Artucuno commented 1 year ago

I got the error while I was trying to download a Wix site too

Forgot that I opened this issue