rajatomar788 / pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.
https://rajatomar788.github.io/pywebcopy/
Other
527 stars 106 forks source link

AssertionError: A file like object with read method is required! #42

Closed renMarkHan closed 4 years ago

renMarkHan commented 4 years ago

I always face AssertionError when running the program. This is my code using pywebcopy and save the zip file into S3. Ask someone to help me, thanks! (MacOS, pywebcopy==6.2.0)

class Webp:

def __init__(self, downloadpath):
    self.downloadpath = downloadpath
    self.file = downloadpath

def websaving(self, url, projectName):

    # self.file is the path of the downloading zip file

    self.file = self.downloadpath + '/' + projectName + '.zip'
    download_folder = self.downloadpath

    kwargs = {'bypass_robots': True, 'project_name': projectName}

    save_webpage(url, download_folder, **kwargs)


        link = "a web page link"
        path = 'path to save them locally'
        name = 'Test'

        web = Webp(path)
        web.websaving(link, name)
rajatomar788 commented 4 years ago

Could you provide logs or more info? From this code I can't seem to identify the problem.

renMarkHan commented 4 years ago

Yes.

hello pywebcopy.configs - INFO - Got response 200 from https://docs.python-guide.org/robots.txt /home/justin/PycharmProjects/py-web-preserve/venv/lib/python3.6/site-packages/pywebcopy/webpage.py:84: UserWarning: Global Configuration is not setup. You can ignore this if you are going manual.This is just one time warning regarding some unexpected behavior. "Global Configuration is not setup. You can ignore this if you are going manual." pywebcopy.configs - INFO - Got response 200 from https://docs.python-guide.org/writing/tests/ webpage - INFO - Starting save_complete Action on url: ['https://docs.python-guide.org/writing/tests/'] parsers - INFO - Parsing tree with source: <<urllib3.response.HTTPResponse object at 0x7f53855fd080>> encoding and parser <<lxml.etree.HTMLParser object at 0x7f53855eb2a8>> webpage - INFO - Starting save_assets Action on url: 'https://docs.python-guide.org/writing/tests/' webpage - Level 100 - Queueing download of <46> asset files. webpage - INFO - Starting save_html Action on url: 'https://docs.python-guide.org/writing/tests/' webpage - INFO - WebPage saved successfully to /home/justin/tmp/download_path/FirstTest/docs.python-guide.org/writing/tests/index.html pywebcopy.configs - INFO - Got response 200 from https://d33wubrfki0l68.cloudfront.net/7b1098a979f88fe7168ab273be04a96c85dd7702/2c5e5/_static/guide-book-cover.jpg elements - INFO - Writing file at location /home/justin/tmp/download_path/FirstTest/d33wubrfki0l68.cloudfront.net/7b1098a979f88fe7168ab273be04a96c85dd7702/2c5e5/_static/0e3679cd__guide-book-cover.jpg pywebcopy.configs - INFO - Got response 200 from https://d33wubrfki0l68.cloudfront.net/bundles/ac5dab34117b78f6e32b6931d79662a7066dfc8f.css elements - INFO - [1] CSS linked files are found in file [/home/justin/tmp/download_path/FirstTest/d33wubrfki0l68.cloudfront.net/bundles/3ffee08a__ac5dab34117b78f6e32b6931d79662a7066dfc8f.css] Exception in thread <Element(LinkTag, https://d33wubrfki0l68.cloudfront.net/bundles/ac5dab34117b78f6e32b6931d79662a7066dfc8f.css)>: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/home/justin/PycharmProjects/py-web-preserve/venv/lib/python3.6/site-packages/pywebcopy/elements.py", line 338, in run self.write_file(contents) File "/home/justin/PycharmProjects/py-web-preserve/venv/lib/python3.6/site-packages/pywebcopy/elements.py", line 177, in write_file assert hasattr(file_like_object, 'read'), "A file like object with read method is required!" AssertionError: A file like object with read method is required!

pywebcopy.configs - INFO - Got response 200 from https://d33wubrfki0l68.cloudfront.net/js/cb055be95f47a965c9dfdea84fcfab8929a2c170/_static/documentation_options.js elements - INFO - Writing file at location /home/justin/tmp/download_path/FirstTest/d33wubrfki0l68.cloudfront.net/js/cb055be95f47a965c9dfdea84fcfab8929a2c170/_static/653d1690documentation_options.js elements - INFO - File of type .js written successfully to /home/justin/tmp/download_path/FirstTest/d33wubrfki0l68.cloudfront.net/js/cb055be95f47a965c9dfdea84fcfab8929a2c170/_static/653d1690documentation_options.js pywebcopy.configs - INFO - Got response 200 from https://d33wubrfki0l68.cloudfront.net/bundles/3ff29e285c6a8fb3cdf7d41dc7d6c9cff6fc0be4.js elements - INFO - Writing file at location /home/justin/tmp/download_path/FirstTest/d33wubrfki0l68.cloudfront.net/bundles/e81be8523ff29e285c6a8fb3cdf7d41dc7d6c9cff6fc0be4.js elements - INFO - File of type .js written successfully to /home/justin/tmp/download_path/FirstTest/d33wubrfki0l68.cloudfront.net/bundles/e81be8523ff29e285c6a8fb3cdf7d41dc7d6c9cff6fc0be4.js elements - INFO - File of type .jpg written successfully to /home/justin/tmp/download_path/FirstTest/d33wubrfki0l68.cloudfront.net/7b1098a979f88fe7168ab273be04a96c85dd7702/2c5e5/_static/0e3679cdguide-book-cover.jpg pywebcopy.configs - INFO - Got response 200 from https://d33wubrfki0l68.cloudfront.net/bbebbb195447ac7d9e8c23c015435b4e9e011e2e/edd02/_static/python-guide-logo.png elements - INFO - Writing file at location /home/justin/tmp/download_path/FirstTest/d33wubrfki0l68.cloudfront.net/bbebbb195447ac7d9e8c23c015435b4e9e011e2e/edd02/_static/d935d616python-guide-logo.png pywebcopy.configs - INFO - Got response 200 from https://docs.python-guide.org/genindex/ elements - INFO - Writing file at location /home/justin/tmp/download_path/FirstTest/docs.python-guide.org/genindex/file_1faee585.pwc elements - INFO - File of type .htm written successfully to /home/justin/tmp/download_path/FirstTest/docs.python-guide.org/genindex/file_1faee585.pwc pywebcopy.configs - INFO - Got response 200 from https://d33wubrfki0l68.cloudfront.net/467452f32ed03aac2095f737b3c5d92a9b3d37bd/b2542/_images/34435687940_8f73fc1fa6_k_d.jpg elements - INFO - Writing file at location /home/justin/tmp/download_path/FirstTest/d33wubrfki0l68.cloudfront.net/467452f32ed03aac2095f737b3c5d92a9b3d37bd/b2542/_images/4c21045c34435687940_8f73fc1fa6_k_d.jpg elements - INFO - File of type .jpg written successfully to /home/justin/tmp/download_path/FirstTest/d33wubrfki0l68.cloudfront.net/467452f32ed03aac2095f737b3c5d92a9b3d37bd/b2542/_images/4c21045c34435687940_8f73fc1fa6_k_d.jpg elements - INFO - File of type .png written successfully to /home/justin/tmp/download_path/FirstTest/d33wubrfki0l68.cloudfront.net/bbebbb195447ac7d9e8c23c015435b4e9e011e2e/edd02/_static/d935d616__python-guide-logo.png pywebcopy.configs - INFO - Got response 200 from https://docs.python-guide.org/genindex/ elements - INFO - [0] CSS linked files are found in file [/home/justin/tmp/download_path/FirstTest/docs.python-guide.org/genindex/file_1faee585.pwc] Exception in thread <Element(LinkTag, https://docs.python-guide.org/genindex/)>: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/home/justin/PycharmProjects/py-web-preserve/venv/lib/python3.6/site-packages/pywebcopy/elements.py", line 338, in run self.write_file(contents) File "/home/justin/PycharmProjects/py-web-preserve/venv/lib/python3.6/site-packages/pywebcopy/elements.py", line 177, in write_file assert hasattr(file_like_object, 'read'), "A file like object with read method is required!" AssertionError: A file like object with read method is required!

pywebcopy.configs - INFO - Got response 200 from https://d31vxm9ubutrmw.cloudfront.net/static/js/2169.js elements - INFO - Writing file at location /home/justin/tmp/download_path/FirstTest/d31vxm9ubutrmw.cloudfront.net/static/js/c899a00d2169.js elements - INFO - File of type .js written successfully to /home/justin/tmp/download_path/FirstTest/d31vxm9ubutrmw.cloudfront.net/static/js/c899a00d__2169.js pywebcopy.configs - INFO - Got response 200 from https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css elements - INFO - [2] CSS linked files are found in file [/home/justin/tmp/download_path/FirstTest/cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/1884e9e1docsearch.min.css] Exception in thread <Element(LinkTag, https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css)>: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/home/justin/PycharmProjects/py-web-preserve/venv/lib/python3.6/site-packages/pywebcopy/elements.py", line 338, in run self.write_file(contents) File "/home/justin/PycharmProjects/py-web-preserve/venv/lib/python3.6/site-packages/pywebcopy/elements.py", line 177, in write_file assert hasattr(file_like_object, 'read'), "A file like object with read method is required!" AssertionError: A file like object with read method is required!

pywebcopy.configs - INFO - Got response 200 from https://ghbtns.com/github-btn.html?user=realpython&repo=python-guide&type=watch&count=true&size=large elements - INFO - Writing file at location /home/justin/tmp/download_path/FirstTest/ghbtns.com/b4d3e88agithub-btn.html elements - INFO - File of type .html written successfully to /home/justin/tmp/download_path/FirstTest/ghbtns.com/b4d3e88a__github-btn.html pywebcopy.configs - INFO - Got response 200 from https://srv.realpython.net/tag.js elements - INFO - Writing file at location /home/justin/tmp/download_path/FirstTest/srv.realpython.net/c1965979tag.js elements - INFO - File of type .js written successfully to /home/justin/tmp/download_path/FirstTest/srv.realpython.net/c1965979tag.js Traceback (most recent call last): File "/home/justin/PycharmProjects/py-web-preserve/tests/test_web_preserve.py", line 24, in unittest.main() File "/usr/lib/python3.6/unittest/main.py", line 95, in init self.runTests() File "/usr/lib/python3.6/unittest/main.py", line 256, in runTests self.result = testRunner.run(self.test) File "/usr/lib/python3.6/unittest/runner.py", line 176, in run test(result) File "/usr/lib/python3.6/unittest/suite.py", line 84, in call return self.run(*args, **kwds) File "/usr/lib/python3.6/unittest/suite.py", line 122, in run test(result) File "/usr/lib/python3.6/unittest/suite.py", line 84, in call return self.run(*args, **kwds) File "/usr/lib/python3.6/unittest/suite.py", line 122, in run test(result) File "/usr/lib/python3.6/unittest/case.py", line 653, in call__ return self.run(*args, kwds) File "/usr/lib/python3.6/unittest/case.py", line 605, in run testMethod() File "/home/justin/PycharmProjects/py-web-preserve/tests/test_web_preserve.py", line 17, in test_write_defaults web.websaving(link, name) File "/home/justin/PycharmProjects/py-web-preserve/gray_webp/webpreserve.py", line 46, in websaving save_webpage(url, download_folder, kwargs) File "/home/justin/PycharmProjects/py-web-preserve/venv/lib/python3.6/site-packages/pywebcopy/api.py", line 94, in save_webpage zip_project(config['join_timeout']) File "/home/justin/PycharmProjects/py-web-preserve/venv/lib/python3.6/site-packages/pywebcopy/core.py", line 39, in zip_project thread.join(timeout=timeout) File "/usr/lib/python3.6/threading.py", line 1056, in join self._wait_for_tstate_lock() File "/usr/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock elif lock.acquire(block, timeout): KeyboardInterrupt

rajatomar788 commented 4 years ago

Found it. Its a bug. While replacing the css urls the string must be a io.Buffered type.

It will be fixed in next release.

rajatomar788 commented 4 years ago

Till then you can try installing it from git repo or install a previous version. Or best try new rewrite in beta mode https://github.com/rajatomar788/pywebcopy7

renMarkHan commented 4 years ago

Thank you very much! That solved the problem.