Closed barsch closed 7 years ago
This is an interesting situation. When I test it against php-textile, the output looks wrong:
<p><ol>
<li>xxx</li>
<li>yyy</li><br />
</ol><br />
<strong>blah</strong></p>
I'm assuming the output you're expecting would be the same except the second li is:
yyy<br />\n<strong>blah</strong>
Is that correct? For the record, you can achieve the same output by using the strong tags in the input: # xxx\n# yyy\n<strong>blah</strong>
I hope that helps for the time being.
For the time being, I'm going to shoot for matching textile as it currently is, but I'll forward this bug along to them as well.
Sounds similar to this php-textile issue which has been known for a while but has not been solved to our satisfaction yet.
@ikirudennis: I don't care to much about the output as long it doesn't raises an exception - one could argue for both cases but I'm no textile expert ...
@barsch, if you need a quick solution, you can use the develop branch, and it won't raise an error for you.
The change I made works for now, but it's a little quick-and-dirty... more concerned with not raising an error than with producing correct output. I'm going to think on it, and look into the underlying issue in php-textile to see if I can reason out something better.
thanks for the quick fix
FYI - we're seeing the same issue with the following string on 2.3.7:
u'*Highlights*\n\n* UNITEK Y-3705A Type-C Universal DockingStation Pro\n* USB3.0/RJ45/EARPHONE/MICROPHONE/HDMI 6 PORT HUB 1.2m Data Cable 5V 4A Power Adaptor\n*\n* Dimensions: 25cm x 13cm x 9cm\n* Weight: 0.7kg'
Removing the extraneous \n*
(After "Power Adapter") fixes this particular string.
@jarshwah I ran your test through php-textile and got some strange output, so it looks like your example may just be incorrect textile formatting. If you're looking to have that blank <li>
the following would be a good way to accomplish that:
*Highlights*
* UNITEK Y-3705A Type-C Universal DockingStation Pro
* USB3.0/RJ45/EARPHONE/MICROPHONE/HDMI 6 PORT HUB 1.2m Data Cable 5V 4A Power Adaptor
*
* Dimensions: 25cm x 13cm x 9cm
* Weight: 0.7kg
That produces the following in both php- and python-textile:
<p><strong>Highlights</strong></p>
<ul>
<li><span class="caps">UNITEK</span> Y-3705A Type-C Universal DockingStation Pro</li>
<li>USB3.0/RJ45/EARPHONE/MICROPHONE/HDMI 6 <span class="caps">PORT</span> <span class="caps">HUB</span> 1.2m Data Cable 5V 4A Power Adaptor</li>
<li> </li>
<li>Dimensions: 25cm x 13cm x 9cm</li>
<li>Weight: 0.7kg</li>
</ul>
Is that more like what you're expecting?
@ikirudennis the text was wrong. The blank li
always had a floating literal *
before 2.3.0, but now crashes instead. The text being rendered comes from an API so it's not so straight forward to just change the text. I've resorted to wrapping the call to textile in a try/catch and rendering the text verbatim on failure.
I don't know what the solution should be for python-textile. On one hand, crashing on bad input isn't a bad idea. On the other, bad input bringing a production site down (because that's where we get our interesting text from) isn't fun either.
If this was my own project, and I wasn't aiming for compatibility with another project, I'd probably add a strict=True
arg and default to crashing. But that's easy to suggest from the sidelines :)
Firstly, thanks again for your response. I always feel bad when this thing causes users to create workarounds
I'm not disagreeing with anything you've said. A third-party text-processing package shouldn't crash a production system due to bad input. It should fail a little more gracefully. And your idea of a strict option is a good one. I'll see about adding that in. And I think it would be helpful to add a note in the README about running old textile through the strict mode in a testing environment before it rudely takes down production. </notetoself>
That being said, figuring out the appropriate way to gracefully fail is the real trick here. I've run your example through textile v2.2.2 and it produces the following:
<p><strong>Highlights</strong></p>
<ul>
<li><span class="caps">UNITEK</span> Y-3705A Type-C Universal DockingStation Pro</li>
<li><span class="caps">USB</span>3.0/RJ45/EARPHONE/MICROPHONE/HDMI 6 <span class="caps"
>PORT</span> <span class="caps">HUB</span> 1.2m Data Cable 5V 4A Power Adaptor</li>
</ul>
*
<ul>
<li>Dimensions: 25cm x 13cm x 9cm</li>
<li>Weight: 0.7kg</li>
</ul>
Maybe not my cup of tea, but it is valid html nonetheless. txstyle.org adds a <br />
onto the asterisk. Do you have any preference? I'll see what I can cook up, given that I'm giving the list section of this a complete overhaul. And the plan is to push my overhaul up to php-textile.
I've also run it through a few different markdown parsers (since the list format is the same), and I get different results on each of them. Some treat the asterisk as text and part of the list item content of the item above it, others treat it as just an empty list item like my
hack. This illustrates one of the things I dislike about markdown: everyone's got their own flavor of it. That's why I try to aim for matching what the php version does, even if I disagree with its output sometimes.
I'm going to apologize about version 2.3.8. While it doesn't crash, it ignores the part of the list after the lone asterisk. One step forward, two steps back. So, for you, I wouldn't recommend using it. I'll see what I can do, but I'm just going to say that I wish I had more time I could devote to this project. Please bear with me.
@jarshwah, I stumbled upon an interesting development, which may or may not help you. When I test your example, as I mentioned before, it doesn't error out, but ignores everything after the lone asterisk. However, if I run it through the pytextile
cli tool, it produces something much closer to what you may be looking for:
<p><strong>Highlights</strong></p>
<ul>
<li><span class="caps">UNITEK</span> Y-3705A Type-C Universal DockingStation Pro</li>
<li>USB3.0/RJ45/EARPHONE/MICROPHONE/HDMI 6 <span class="caps">PORT</span> <span class="caps">HUB</span> 1.2m Data Cable 5V 4A Power Adaptor<br />
*</li>
<li>Dimensions: 25cm x 13cm x 9cm</li>
<li>Weight: 0.7kg</li>
</ul>
So, maybe you can change your catch to run it through the cli tool on error? Does that help? I was thinking that there's also a chance that the following may clear resolve the issue as well:
# assuming the test text is textile_input
if '\n*\n*' in textile_input:
textile_input.replace('\n*\n*', '\n* \n*')
Please let me know either way if that helps or hurts things.
Thanks for the follow up @ikirudennis I appreciate the thought you're putting into this.
Regarding your workarounds, I'm probably unlikely to call out to the cli tool. The text replacement is more my cup of tea - and something I could definitely incorporate as pre-validation within the try/catch. I'm still going to wrap in a try/catch to guard against future issues, but that's probably good practise anyway.
Maybe not my cup of tea, but it is valid html nonetheless. txstyle.org adds a
onto the asterisk. Do you have any preference?
As you've pointed out, the output isn't going to be nice either way, because the input is fundamentally wrong. I think I'd slightly prefer a
since that is likely the intended result, but as long as the text before and after the empty li is preserved as a list (or two lists) I think that's ok.
This illustrates one of the things I dislike about markdown: everyone's got their own flavor of it. That's why I try to aim for matching what the php version does, even if I disagree with its output sometimes.
Very reasonable. Especially if you're planning to (try to) upstream what you think are good changes.
I'll see what I can do, but I'm just going to say that I wish I had more time I could devote to this project. Please bear with me.
The unglamorous parts of OSS :) I'd just like to say that whatever pace you approach this with is fine by me. I'm in no hurry (we can stay on an older version or workaround with approaches given), so there's no pressure coming from my side. Happy to help out where I can.
works as expected:
removing space between
\n
and*blah*
, raises an error