torchbox / wagtail-wordpress-import

A package for Wagtail CMS to import WordPress blog content from an XML file into Wagtail
MIT License
40 stars 15 forks source link

Header images seen error #85

Closed nickmoreton closed 2 years ago

nickmoreton commented 2 years ago

A requests timeout when fetching a header image. This is happening at the final stage. The function in question

File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/wagtail-wordpress-import/wagtail_wordpress_import/block_builder_defaults.py", line 229, in get_or_save_image

seems to behave OK while the pages are imported and linking up rich_text images.

Biggest issue is that the import hooks fails to complete early on

Full Trace

Traceback (most recent call last):
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/Users/nickm/.pyenv/versions/3.8.10/lib/python3.8/http/client.py", line 1344, in getresponse
    response.begin()
  File "/Users/nickm/.pyenv/versions/3.8.10/lib/python3.8/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/Users/nickm/.pyenv/versions/3.8.10/lib/python3.8/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/Users/nickm/.pyenv/versions/3.8.10/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/Users/nickm/.pyenv/versions/3.8.10/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/Users/nickm/.pyenv/versions/3.8.10/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='www.budgetsaresexy.com', port=443): Read timed out. (read timeout=5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/wagtail-wordpress-import/wagtail_wordpress_import/management/commands/import_xml.py", line 70, in handle
    importer.run(
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/wagtail-wordpress-import/wagtail_wordpress_import/importers/wordpress.py", line 176, in run
    self.items_cache.process(
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/wagtail-wordpress-import/wagtail_wordpress_import/importers/import_hooks.py", line 76, in process
    import_string(func)(page, data, items_cache)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/example/pages/import_hooks.py", line 19, in header_image
    image = get_or_save_image(image_url)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/wagtail-wordpress-import/wagtail_wordpress_import/block_builder_defaults.py", line 229, in get_or_save_image
    response, valid, type = fetch_url(src)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/wagtail-wordpress-import/wagtail_wordpress_import/block_builder_defaults.py", line 249, in fetch_url
    r = requests.get(src, **conf_get_requests_settings())
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/Users/nickm/MotleyFool/wagtail-xmlimport-develop/venv/lib/python3.8/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.budgetsaresexy.com', port=443): Read timed out. (read timeout=5)

The page it was operating on

<item>
  <title>I'm harboring a lot of anger right now, so avert if you're happy.</title>
  <link>https://www.budgetsaresexy.com/?p=299</link>
  <pubDate>Thu, 06 Nov 2008 21:51:00 +0000</pubDate>
  <dc:creator>jMoney</dc:creator>
  <guid isPermaLink="false">https://www.budgetsaresexy.com/2008/11/im-harboring-a-lot-of-anger-right-now-so-avert-if-youre-happy/</guid>
  <description />
  <content:encoded>&lt;a href="https://www.budgetsaresexy.com/images/my_face.gif"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" alt="my face." src="https://www.budgetsaresexy.com/images/my_face.gif" border="0" /&gt;&lt;/a&gt;&lt;strong&gt;&lt;span style="font-size:130%;"&gt;I seriously just don't get how people can treat other people so horribly?!&lt;/span&gt;&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;What is wrong w/ this world? I'm sorry to say, but this is probably not gonna be the happiest of posts today - just keeping things honest up in here ;)&lt;br /&gt;&lt;br /&gt;It's not accurate if i only post all the happy happy,joy joy stuff right? This is life, and it's crazy sometimes...i'm also selfishly hoping to feel better when i finish here.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;&lt;span style="font-size:130%;"&gt;Order #1 on the docket&lt;/span&gt;&lt;/strong&gt; - That &lt;a href="https://www.budgetsaresexy.com/2008/10/wedding-pics-update-its-on-like-donkey.html"&gt;d-head of a photographer &lt;/a&gt;wrote back after 4 weeks with another sob story and demanding us to stop contacting him as his life sucks. He said he'll send the Proofs (no raw images, prints, nothing - just the proofs) by end of December. Are you on crack, son? I'll agree with him that he sucks @ life, but sorry buddy, i've lost all patience and respect for you. You do not have the right to ask for a thing anymore.&lt;br /&gt;&lt;br /&gt;I left him the following note on facebook (his preferred method apparently), email, voicemail, and text:&lt;br /&gt;&lt;blockquote&gt;"D-head &lt;em&gt;(edited)&lt;/em&gt;,&lt;br /&gt;&lt;br /&gt;Our condolences for your losses. December is not going to work. We will give you until next Friday, November 14th, to simply mail us the RAW images and you will not be bothered again. If we do not receive these RAW images by Friday, November 14th, we will see you in court. Apparently filing a BBB complaint was not enough.&lt;br /&gt;&lt;br /&gt;Mr. and Mrs. Rockstar &lt;em&gt;(edited)&lt;/em&gt;"&lt;/blockquote&gt;Needless to say, i've basically lost my $hit now w/ this turdbucket. But fingers crossed.&lt;br /&gt;&lt;br /&gt;****************&lt;br /&gt;&lt;em&gt;&lt;strong&gt;&lt;span style="color: rgb(204, 0, 0);"&gt;*UPDATE:&lt;/span&gt;&lt;/strong&gt; Got the following back within 30 mins (pretty good for someone who has no access to any communication, huh? &lt;strong&gt;"As stated. I will not be able to make that deadline."&lt;/strong&gt; I basically told him that if he does not meet the deadline, the court will decide further action. HAH! F'er.&lt;/em&gt;&lt;br /&gt;****************&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;&lt;span style="font-size:130%;"&gt;Order #2 on the docket&lt;/span&gt;&lt;/strong&gt;: My paycheck just bounced, and i am now scared this economy has finally affected us. There's a whole back story here, but unfortunately i cannot risk saying anything more for fear of blowing my anonymous cover. That, and i actually LOVE my job despite all of this - so i'd rather not depart prematurely. Still aggravating as hell. &lt;em&gt;&lt;strong&gt;&lt;span style="color: rgb(204, 0, 0);"&gt;UPDATE&lt;/span&gt;&lt;/strong&gt;: Will be getting a wire transfer within 24 hours...crisis averted.&lt;br /&gt;&lt;/em&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;strong&gt;Order #3 on the docket:&lt;/strong&gt;&lt;/span&gt; The Mrs. is sick :( Made 3-4 trips to dr's, hospitals, you name it. We're going on 3 days now, and no diagnosis has been found - which is both good and bad really. This has nothing to do with personal finance, unless i talk about how i saved $18 remembering to bring my insurance card this time for the drugs!, but it still upsets me :( this is a rant on life moreso than against a particular person. &lt;em&gt;&lt;strong&gt;&lt;span style="color: rgb(204, 0, 0);"&gt;UPDATE:&lt;/span&gt;&lt;/strong&gt; Wifey still sick, seeing doctor in one hour! &lt;/em&gt;&lt;em&gt;&lt;strong&gt;&lt;span style="color: rgb(204, 0, 0);"&gt;UPDATE #2:&lt;/span&gt;&lt;/strong&gt; Wifey went back into E.R., and is now staying over night! yikesy mama - prob. have to take her gallbladder out, but we'll know more in the mornin'.  time to go join her and have a "fun" sleep!  &lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Okay, that is all. I'm not feeling 100% better now, but it def. helped! Thanks for listening, and PLEASE please PLEASE &lt;u&gt;holler back with some happy stuff&lt;/u&gt; if you've got anything. Some of you have to be having a blessed day, right?!</content:encoded>
  <excerpt:encoded />
  <wp:post_id>299</wp:post_id>
  <wp:post_date>2008-11-06 16:51:00</wp:post_date>
  <wp:post_date_gmt>2008-11-06 21:51:00</wp:post_date_gmt>
  <wp:post_modified>2014-11-09 17:50:31</wp:post_modified>
  <wp:post_modified_gmt>2014-11-09 22:50:31</wp:post_modified_gmt>
  <wp:comment_status>open</wp:comment_status>
  <wp:ping_status>open</wp:ping_status>
  <wp:post_name>im-harboring-lot-of-anger-right-now-so</wp:post_name>
  <wp:status>draft</wp:status>
  <wp:post_parent>0</wp:post_parent>
  <wp:menu_order>0</wp:menu_order>
  <wp:post_type>post</wp:post_type>
  <wp:post_password />
  <wp:is_sticky>0</wp:is_sticky>
  <category domain="category" nicename="career">Career</category>
  <category domain="category" nicename="rant">Rant</category>
  <wp:postmeta>
    <wp:meta_key>blogger_blog</wp:meta_key>
    <wp:meta_value>budgetsaresexy.blogspot.com</wp:meta_value>
  </wp:postmeta>
  <wp:postmeta>
    <wp:meta_key>blogger_permalink</wp:meta_key>
    <wp:meta_value>/2008/11/im-harboring-lot-of-anger-right-now-so.html</wp:meta_value>
  </wp:postmeta>
  <wp:postmeta>
    <wp:meta_key>_sexybookmarks_shortUrl</wp:meta_key>
    <wp:meta_value>http://b2l.me/x68z8</wp:meta_value>
  </wp:postmeta>
  <wp:postmeta>
    <wp:meta_key>_sexybookmarks_permaHash</wp:meta_key>
    <wp:meta_value>b8a4df8889c10c4451a71207f81bd47a</wp:meta_value>
  </wp:postmeta>
  <wp:postmeta>
    <wp:meta_key>_thumbnail_id</wp:meta_key>
    <wp:meta_value>15355</wp:meta_value>
  </wp:postmeta>
  <wp:postmeta>
    <wp:meta_key>_edit_last</wp:meta_key>
    <wp:meta_value>3</wp:meta_value>
  </wp:postmeta>
  <wp:postmeta>
    <wp:meta_key>thesis_title</wp:meta_key>
    <wp:meta_value>Angry with lost photography, lost paycheck, and now the Mrs. is sick :(</wp:meta_value>
  </wp:postmeta>
  <wp:postmeta>
    <wp:meta_key>_totes</wp:meta_key>
    <wp:meta_value>0</wp:meta_value>
  </wp:postmeta>
  <wp:postmeta>
    <wp:meta_key>swp_cache_timestamp</wp:meta_key>
    <wp:meta_value>411567</wp:meta_value>
  </wp:postmeta>
</item>

The image trying to link

<item>
  <title />
  <link>https://www.budgetsaresexy.com/?attachment_id=15355</link>
  <pubDate>Mon, 10 Oct 2011 00:32:22 +0000</pubDate>
  <dc:creator>jMoney</dc:creator>
  <guid isPermaLink="false">https://www.budgetsaresexy.com/images/my_face2.gif</guid>
  <description />
  <content:encoded />
  <excerpt:encoded />
  <wp:post_id>15355</wp:post_id>
  <wp:post_date>2011-10-09 20:32:22</wp:post_date>
  <wp:post_date_gmt>2011-10-10 00:32:22</wp:post_date_gmt>
  <wp:post_modified>2011-10-09 20:32:22</wp:post_modified>
  <wp:post_modified_gmt>2011-10-10 00:32:22</wp:post_modified_gmt>
  <wp:comment_status>open</wp:comment_status>
  <wp:ping_status>open</wp:ping_status>
  <wp:post_name>15355</wp:post_name>
  <wp:status>inherit</wp:status>
  <wp:post_parent>299</wp:post_parent>
  <wp:menu_order>0</wp:menu_order>
  <wp:post_type>attachment</wp:post_type>
  <wp:post_password />
  <wp:is_sticky>0</wp:is_sticky>
  <wp:postmeta>
    <wp:meta_key>_thumbnail_id</wp:meta_key>
    <wp:meta_value>43124</wp:meta_value>
  </wp:postmeta>
</item>
nickmoreton commented 2 years ago

I'm closing this issue because I think the problem was with the original XML data we were using. A theme test XML file works OK. https://codex.wordpress.org/Theme_Unit_Test