vladris / tinkerer

Python blogging engine
https://vladris.com/tinkerer
Other
305 stars 81 forks source link

Exception while building the blog #42

Closed hargup closed 10 years ago

hargup commented 10 years ago
[hargup  blog  (gh-pages) ]$tinker --build
Making output directory...
Running Sphinx v1.2
loading pickled environment... not yet created
building [html]: targets for 11 source files that are out of date
updating environment: 11 added, 0 changed, 0 removed
reading sources... [100%] master                                                
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] master                                                 
writing additional files...
Exception occurred:
  File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)
ExpatError: junk after document element: line 8, column 0
The full traceback has been saved in /tmp/sphinx-err-NEFQgD.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
Either send bugs to the mailing list at <http://groups.google.com/group/sphinx-users/>,
or report them in the tracker at <http://bitbucket.org/birkenfeld/sphinx/issues/>. Thanks!

Here's the log file

# Sphinx version: 1.2
# Python version: 2.7.3
# Docutils version: 0.11 release
# Jinja2 version: 2.7.1
# Loaded extensions:
#   tinkerer.ext.disqus from /usr/local/lib/python2.7/dist-packages/tinkerer/ext/disqus.pyc
#   tinkerer.ext.blog from /usr/local/lib/python2.7/dist-packages/tinkerer/ext/blog.pyc
#   sphinx.ext.oldcmarkup from /usr/local/lib/python2.7/dist-packages/sphinx/ext/oldcmarkup.pyc
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/sphinx/cmdline.py", line 246, in main
    app.build(force_all, filenames)
  File "/usr/local/lib/python2.7/dist-packages/sphinx/application.py", line 212, in build
    self.builder.build_update()
  File "/usr/local/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 214, in build_update
    'out of date' % len(to_build))
  File "/usr/local/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 273, in build
    self.finish()
  File "/usr/local/lib/python2.7/dist-packages/sphinx/builders/html.py", line 453, in finish
    for pagename, context, template in pagelist:
  File "/usr/local/lib/python2.7/dist-packages/tinkerer/ext/blog.py", line 107, in html_collect_pages
    for name, context, template in collect_additional_pages(app):
  File "/usr/local/lib/python2.7/dist-packages/tinkerer/ext/blog.py", line 86, in collect_additional_pages
    for name, context, template in rss.generate_feed(app):
  File "/usr/local/lib/python2.7/dist-packages/tinkerer/ext/rss.py", line 56, in generate_feed
    replace_read_more_link=not app.config.rss_generate_full_posts)),
  File "/usr/local/lib/python2.7/dist-packages/tinkerer/ext/patch.py", line 106, in patch_links
    doc = xml.dom.minidom.parseString(in_str)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 1930, in parseString
    return expatbuilder.parseString(string)
  File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in parseString
    return builder.parseString(string)
  File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)
ExpatError: junk after document element: line 8, column 0
vladris commented 10 years ago

Could you please let me know which version of Tinkerer are you using and which input is causing this error?

hargup commented 10 years ago

1.3.0

[hargup  blog  (gh-pages) ]$tinker --version
Tinkerer version 1.3.0
vladris commented 10 years ago

What about input causing the exception?

obonaventure commented 10 years ago

I had the same problem and found the solution. In my case, the problem was due to an external hyperlink : Multipath TCP in the Linux kernel <http://www.multipath-tcp.org>_,

If the comma is immediately after the character, the bug appears. If there is a space after the character, everything works well

vladris commented 10 years ago

Oh, I see. Thanks for letting me know! I'll take a look.

vladris commented 10 years ago

I tried adding a RST link followed by a comma as you suggested but I couldn't repro this. I tried Python 2.7 and 3.3 under Windows and Linux. Build seemed to work fine for test like:

`Multipath TCP in the Linux kernel <http://www.multipath-tcp.org>`_, bla bla.

I'm not very surprised to see this issue because patch.py does use minidom to parse HTML as XML, which requires some hacks, but unfortunately I can't repro this particular issue. Any more details you could provide? Maybe a full RST file that I can use?

beberlei commented 10 years ago

I have this error in one post where someone hid an easter egg, putting an </h3> somewhere in the text.

It should just skip the generation, not completly abort. The error message also doesnt show which post causes the error, and the reporting is based on the HTML lines. I only fixed this by hacking into the expatbuilder.py and dump the string in the parse function to a file.

vladris commented 10 years ago

That would definitely do it. But yeah, I should at least handle exceptions a bit better.

vladris commented 10 years ago

Should be fixed with patch.py rewrite with pyquery

vladris commented 10 years ago

Looks like lxml is a pain to set up so I'm not 100% sure I want to take a dependency on it (via PyQuery) at this point. Reopening the issue for now so I don't forget about it if I revert the PyQuery change.

ceeram commented 10 years ago

With pip installed tinkerer i get this error as well: https://gist.github.com/ceeram/ccd07a2dde141a3fcb3d

When using master i get another error: https://gist.github.com/ceeram/bc63506ae9f55425e48c

Im running tinker --build on ~1100 files, manual sphinx-build on the same files builds without exceptions, just many warnings and errors.

ceeram commented 10 years ago

https://github.com/vladris/tinkerer/blob/master/tinkerer/ext/patch.py#L106 adding check for != None on anchor.get('class') made the build succeed for me.

vladris commented 10 years ago

Thanks for the pointer! Do you by any chance have an input that can cause this error? I'd like to add a unit test and make sure I add the right check.

ceeram commented 10 years ago

I will need to dig through the files, i will get back to this once i got it isolated to a single file causing this.

ceeram commented 10 years ago

Was able to pin down the issue:

https://gist.github.com/ceeram/70e0ed74301e4391bd44

The build exception occurs when the file has an invalid link with a newline as shown in the gist

ceeram commented 10 years ago

If you like i can try to provide a test and fix, if i am able to. I'm just a salesman doing some open source php development in my freetime, so i might need some guidance if i get stuck.

vladris commented 10 years ago

Hey, thanks a bunch for the detailed steps! It's OK, I'll implement a fix for this later today. Should be easy now that I have a repro case. Thanks again!