textile / python-textile

A Python port of Textile, A humane web text generator
Other
68 stars 23 forks source link

Problems with inline HTML (<pre>) #28

Closed rcarmo closed 8 years ago

rcarmo commented 8 years ago

Since upgrading to 2.3.1 from 2.2.x, all the inline code blocs in my site are mis-processed:

buffer = """
So here I am porting my ancient "newspipe":newspipe "front-end":blog/2006/09/30/0950 to "Snakelets":Snakelets and "Python":Python, and I've just trimmed down over 20 lines of "PHP":PHP down to essentially one line of "BeautifulSoup":BeautifulSoup retrieval:

<pre>
def parseWapProfile(self, url):
  result = fetch.fetchURL(url)
  soup = BeautifulStoneSoup(result['data'], convertEntities=BeautifulStoneSoup.HTML_ENTITIES)
  try:
    width, height = soup('prf:screensize')[0].contents[0].split('x')
  except:
    width = height = None
  return {"width": width, "height": height}
</pre>

Of course there's a lot more error handling to do (and useful data to glean off the "XML":XML), but being able to cut through all the usual parsing crap is immensely gratifying.
"""

... turns into:

<p>So here I am porting my ancient <a href="newspipe">newspipe</a> <a href="blog/2006/09/30/0950">front-end</a> to <a href="Snakelets">Snakelets</a> and <a href="Python">Python</a>, and I&#8217;ve just trimmed down over 20 lines of <a href="PHP"><span class="caps">PHP</span></a> down to essentially one line of <a href="BeautifulSoup">BeautifulSoup</a> retrieval:</p>\n\n<pre syntax="python">\ndef parseWapProfile(self, url):\n  result = fetch.fetchURL(url)\n  soup = BeautifulStoneSoup(result[&#8216;data&#8217;], convertEntities=BeautifulStoneSoup.HTML_ENTITIES)\n  try:\n    width, height = soup(&#8216;prf:screensize&#8217;)<sup class="footnote" id="fnrev363c2af623094ac3a94774d87ffe2583-1"><a href="#fn363c2af623094ac3a94774d87ffe2583-1">0</a></sup>.contents<sup class="footnote"><a href="#fn363c2af623094ac3a94774d87ffe2583-1">0</a></sup>.split(&#8216;x&#8217;)\n  except:\n    width = height = None\n  return {<a href="">width</a> width, <a href="">height</a> height}\n</pre>\n\n\t<p>Of course there&#8217;s a lot more error handling to do (and useful data to glean off the <a href="XML"><span class="caps">XML</span></a>), but being able to cut through all the usual parsing crap is immensely gratifying.</p>

...turning the code inside into a mess of footnote references, whereas the previous version left the code tags alone. Testing the PRE tag in isolation works, but not if it follows a text paragraph.

ikirudennis commented 8 years ago

Hmm. I created a test for this and it seems to be returning the correct output. I'll push it out and see what travis makes of it. Please take a look and confirm that my test is producing the output you expect.

edit: I almost forgot... The only slight issue with my test is that it's producing &#34;s instead of &quot;s around the width and height dictionary keys in the return.

rcarmo commented 8 years ago

Well, the quotes aren't a big problem (they need to be proper quotes, I think), but that should work. Since I'm fiddling with the Textile object directly, I suspect there are some things I'm using that aren't documented...

ikirudennis commented 8 years ago

I've corrected the quot entity.

Should I close this issue or is there still some work to be done?

rcarmo commented 8 years ago

LGTM. I'll need to run it through my docs, though.