stephenharris / WP-MarkDown

WP-MarkDown plug-in. Allows Markdown to be enabled in posts, comments and bbPress forums.
http://wordpress.org/extend/plugins/wp-markdown/
112 stars 19 forks source link

Unmatched tag irreversably destroys markdown #47

Closed drewcrawford closed 10 years ago

drewcrawford commented 10 years ago

I was working on a post earlier this evening and somehow (it is a mystery to me how exactly) I got an errant <p> tag in my div. What happened next was unexpected:

  1. WP-Markdown preprocessed all the text following the errant tag, converting MD to HTML markup
  2. This conversion occurred silently. I was still happily working in Markdown, completely unaware that WP-markdown had bricked a long part of the article behind the scenes in a way that didn't immediately show up while I was typing
  3. The error finally showed up when I hit "Save Draft" but that triggers a page refresh and so I no longer have a buffer with the original Markdown text
  4. Between steps 2 and 3, I piled up over an hour's worth of revision history that all contained HTML markup instead of MD markup
  5. Thus there is no way to restore to any backup that contains MD markup without losing an hour's work.

    Reproduction steps

Paste this into a post

<div id="for_ios">
  Whatever some text <p>
</div>

Test 

1

2

3

Press "Save Draft"

Expected results: same as input

Actual results:

<div id="for_ios">
  Whatever some text <p>
    </div> <p>
      Test
    </p>

    <p>
      1
    </p>

    <p>
      2
    </p>

    <p>
      3
    </p>

Expected result: some kind of clue that I am saving non-markdown markup to the DB

Actual result: no clue that I am destroying the sole copy of my Markdown source by pressing "Save Draft"

stephenharris commented 10 years ago

Thanks for the outlining the steps @drewcrawford . Just to give you an idea of what is happening behind the scenes:

  1. When post (or draft) is saved the Markdown is converted to HTML, and stored in the database
  2. When editing a post that HTML is converted to Markdown.

The problem is entirely independent of the 'save draft' and can be replicated with 'publish' too. I believe the problem is with step 2.

Specifically. If you follow these steps:

  1. Enter the example you gave
  2. Click publish
  3. View the post
  4. Edit the post

You'll notice that at (3) everything seems correct. At this point the HTML in the database is as expected:

<div id="for_ios">
  Whatever some text <p>
</div>

<p>Test</p>

<p>1</p>

<p>2</p>

<p>3</p>

(You may notice that on the front-end WordPress would have closed that errant <p> tag.) However, at (4) the markdown (obtained by converting the above HTML to markdown), is incorrect. I believe this is because by the invalid HTML mark-up is confusing the HTML --> markdown converter.

At this point, I'm wondering if balance_tags() should be used before converting HTML to markdown. I'll add a few unit tests to see if this addresses the issue.

drewcrawford commented 10 years ago

Thanks for your explanation. I see now that HTML is what is stored in the database, and the MD is just a representation generated at presentation time.

However there is something I don't really understand about the HTML --> markdown converter. If I strip the errant <p> tag:

<div id="for_ios">
  Whatever some text
</div>

<p>Test</p>

<p>1</p>

<p>2</p>

<p>3</p>

Based on your explanation I would expect save/publish on this input to come back nicely as Markdown. Incidentally this would have saved me a lot of time as I would be able to simply delete the offending tag and exploit "save draft" to generate (effectively "recover") a nice clean Markdown source as I guess it does every time I edit any post.

What actually happens however is cleaning up the errant tag and saving draft leaves all the new tags in place. For example <p>1</p> is still there, etc. So somehow WP-MarkDown knows to roundtrip them, which makes this situation less recoverable then it would otherwise be.

stephenharris commented 10 years ago

If you post the above snippet from your previous comment, then yes you should expect the paragraph tags to disappear. (Markdown respects any entered HTML, but they are cleaned up when converting HTML to Markdown)

However, the HTML -> markdown bug not only wrapped the numbers in paragraph tags, but also indented them, so that they became blocks of code. For me this is why the paragraph tags remain. If I copy and paste your above example the numbers are no longer indented and the <p> tags are cleaned up.

stephenharris commented 10 years ago

The above fix should resolve the issue (and passes new and existing unit tests). The result after entering:

<div id="for_ios">
  Whatever some text <p>
</div>

Test 

1

2

3

Should be:

<div id="for_ios">
  Whatever some text <p>
  </p>
</div>

Test

1

2

3

This will be in the 1.6.0 release. I'm ready to close this, but would appreciate it if you could verify that the changes prevent this error from occurring

drewcrawford commented 10 years ago

Looks good to me

stephenharris commented 10 years ago

Great, closing this as it'll be fixed in 1.6.0