some-programs / exitwp

Exitwp is tool primarily aimed for making migration from one or more wordpress blogs to the jekyll blog engine as easy as possible.
686 stars 145 forks source link

ValueError: invalid literal for int() with base 10: 'No Content Found' #80

Closed GenieTim closed 5 years ago

GenieTim commented 5 years ago

Thank you very much for this awesome tool. Unfortunately, I am having issues with the newest WordPress export (I did not find out whether they changed something). Anyways, for one, I get "Wrong date in {my title}" errors, from line 218, as all the dates are formatted like e.g. Mon, 17 Nov 2014 18:58:32 +0000. This is not as bad as the crash I get after this warning with the following traceback:

Traceback (most recent call last):
  File "exitwp.py", line 382, in <module>
    write_jekyll(data, target_format)
  File "exitwp.py", line 306, in write_jekyll
    'wordpress_id': int(i['wp_id']),
ValueError: invalid literal for int() with base 10: 'No Content Found'

Unfortunately, I did not find an issue in my XML, there is the corresponding <wp:post_id> tag, no xmllint issues found. Can you give me a hint what to look out for? Thanks.

piavgh commented 5 years ago

I have this exact issue at the moment.

piavgh commented 5 years ago

Another person had a similar issue just 2 days ago: https://github.com/thomasf/exitwp/issues/79

GenieTim commented 5 years ago

Indeed, sorry, I seem to have picked a totally different focus than the other person, so I did not consider mine a duplicate, but upon closer inspection, you are right, this is a duplicate. Shall I close it?

Anyways, their idea with the namespace is a good point. Trying to find the error, I got to lines 127, to which we can add a

result = 'No Content Found For ' + ns[namespace] + tag
print(result)
print(vars(i))

, which results in the unfortunate

No Content Found For {http://wordpress.org/export/1.2/}post_id
{'text': '\n\t\t\t', 'attrib': {}, 'tag': 'item', 'tail': '\n\t', '_children': [<Element '{http://wordpress.org/export/1.2/}post_id' at 0x1101dc050>, ...]}

Unfortunate because I cannot explain the result; I fear a problem in the underlying xml.etree.ElementTree as this should be the correct selector apparently?!?

GenieTim commented 5 years ago

Okay, got it: changing #125 to result = (i.find(q, ns) or i.find(tag) or i.find(ns[namespace] + tag)).text.strip() apparently solved it. Not exactly sure how this is better, but what works, works. May I ask you @piavgh and @mettsal whether this solved the issue for you too so I could open a PR?

mettsal commented 5 years ago

@GenieTim it worked for me! Build directory is now complete! Thanks a lot!

piavgh commented 5 years ago

@GenieTim : It works for me also. Thank you for your time and effort