phanson / Medius

Your all-in-one application for turning a blog into a book!
http://philiphanson.org/medius
1 stars 0 forks source link

Import from WordPress eXtended RSS (WXR) #3

Closed phanson closed 13 years ago

phanson commented 13 years ago

Need ability to import all posts from a blog that has been exported as a WXR file. Must preserve author information and publish date, but need not include comments.

phanson commented 13 years ago

Complicated a bit from my original concept in that we will also need to perform the same text processing as WordPress to convert from text to full HTML. Ideally there is a pre-existing library for that.

phanson commented 13 years ago

On second thought, the initial implementation may only need the equivalent of s/\n\n+/<\/p><p>/. The rest is generally HTML anyway. Corner cases can be handled by post editing for now.

phanson commented 13 years ago

This issue is delaying Beta because I am having so much trouble with the XSLT transformation engine.

phanson commented 13 years ago

Taking this out of the Beta milestone because the XSLT is workable (and it is the way I made my test files), but the details of running the transform programmatically are proving too difficult for me. We need to move forward.

I will come back to this later.

phanson commented 13 years ago

Turns out the issues I was having before were the result of trying to read the XSLT from a Stream, but still use a normal XmlUrlResolver. I wrote a dumb little stream resolver (uses more memory than necessary) that makes it workable.

A better approach would be to use something like the XmlResourceResolver found in this article: http://msdn.microsoft.com/en-us/library/aa302284.aspx