michelf / php-markdown

Parser for Markdown and Markdown Extra derived from the original Markdown.pl by John Gruber.
http://michelf.ca/projects/php-markdown/
Other
3.43k stars 530 forks source link

Output without surrouding `<p>` tags? #230

Open silverdr opened 8 years ago

silverdr commented 8 years ago

Is there a way to configure the parser in such a way as NOT to generate the surrounding paragraph tags?

michelf commented 8 years ago

Do you mean you don't want paragraph tags anywhere? There is no option for that.

Or do you mean that some <p> tags are put somewhere they shouldn't? Perhaps that's a bug.

Anyway, I'll need some example input and output to know what you're talking about.

silverdr commented 8 years ago

I mean only the surrounding / top-level <p> and </p> tags. The output seems to be (logically) always enclosed in <p> tags:

Markdown: Foo HTML: <p>Foo</p>

What I am looking for is a possibility of NOT enclosing the output in <p> tags if there is no explicit need for it. Explicit need would be in the case of:

Markdown:

Foo

Bar

HTML:

<p>Foo</p><p>Bar</p>

But NOT in the case of: Markdown:

Foo
Bar

HTML:

<p>Foo
Bar</p>

In the latter (or any "one-liner") case I'd be very glad being able to disable the generation of enclosing <p></p> tags.

michelf commented 8 years ago

I don't think I get it. This is like requesting for > blockquotes not to be enclosed in <blockquote>. How is that useful? Why would you not want your paragraph be enclosed in <p> tags?

That said, you can use <div> to avoid paragraphs from being formed:

<div>
Foo
Bar
</div>

In the case above, no paragraph tag is added and the output is the same as the input.

robsonsobral commented 8 years ago

I think what @silverdr is asking is a way to output markdown when the code is already inside another element.

Let's say I have a template like this:


<p><strong>Summary: </strong> {{ my_var_to_be_outputed_as_markdown }}.</p>

Sometimes, the markdown content is an "inline" element, not a "block" one.

michelf commented 8 years ago

Oh! Only parse span-level elements. That would make sense.

There's no API for this unfortunately. With some cleverness you could fake it with Markdown Extra by wrapping the content in <p markdown="1">...Markdown content...</p> before giving it to the parser, and then removing the surrounding <p> in the result that would look like this <p>...HTML content...</p>.

But that's a bit messy. Perhaps there should be an function you could call for this.

silverdr commented 8 years ago

This is like requesting for > blockquotes not to be enclosed in <blockquote>. How is that useful?

Not exactly. With > blockquotes the user has explicit control on whether <blockquote> is generated or not. So this would certainly not be of any use because the user wants to have the <blockquote> by using appropriate Markdown syntax. With enclosing <p>s, there is no way the user can say "I don't want them" in Markdown so the only place to say this would be in parser's options.

Why would you not want your paragraph be enclosed in <p> tags?

Because the output in this particular use case is being rendered into an existing layout where those extra <p>s cause unwanted side-effects. And since block elements are not required in HTML:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <title>Title</title>
</head>
<body>
    Body
</body>
</html>

is a valid HTML without a single <p> (or <div>) - there is no requirement to enforce:

[...]
<body>
    <p>Body</p>
</body>
[...]

in every case.

@robsonsobral: Yes, the case is quite similar to your example.

robsonsobral commented 8 years ago

@michelf , considering that sometimes the content doesn't have any double line breaks, it makes sense allow it not being enclosed by <p>s.

silverdr commented 8 years ago

With some cleverness you could fake it with [...]

Currently I scan the output, check whether the surrounding <p></p>s are the only ones in the output and strip them if so - works but hurts my eyes, whenever I see it ;-) But as @robsonsobral just commented and what one of my examples was about - in case of Markdown not having any double newlines:

Markdown:

Foo
Bar

I'd be happy being able to get output like HTML:

Foo Bar

or even HTML:

Foo
Bar

if that's easier.

michelf commented 8 years ago

Currently I scan the output, check whether the surrounding <p></p>s are the only one in the output and strip them if so - works but hurts my eyes, whenever I see it ;-)

Honestly, I don't think such a behavior would be less clunky if implemented in the parser. Short of redesigning the parser to maintain an intermediary AST representation of the Markdown document, I think you just described the best way to implement what you want.

silverdr commented 8 years ago

Hmm… that's not the best news. BTW, I was googling around the subject a bit and seems that I am not the only one with similar problem.

https://github.com/erusev/parsedown/issues/43

they seem to have solved somehow but that's another parser. Possibly differently designed.

michelf commented 8 years ago

Is it the same problem though? The solution in that thread seems to be a mode for parsing only span-level elements, disabling lists, blockquotes, code blocks, etc. That will produce a different result from removing the <p> tag is the output is a single paragraph.

silverdr commented 8 years ago

Reading the explanations of the OP, the case/problem is the very same. He even used similar examples (could have copied-pasted them instead of inventing own here ;-)

The solution though - I am not sure - would need to check how that other parser behaves. They seem to have agreed on making "span-level method" public but what's the output I can't say. If it disables everything you mentioned, then it's not a solution, in fact. Maybe except for some very simple input cases.

robsonsobral commented 8 years ago

I had similar problems several times. For example:

For me, it never was a problem with all block elements. It always was just the p resulting in things like:

This is speacially a problem on CMSs which uses a limited template language. Sometimes I used a plugin to find and replace the tags, sometimes I just quit.

ghost commented 8 years ago

I have the very same problem using figuretag and picture.

## This work

![My picture][img_01]
[img_01]:  http://www.w3.org/html/logo/img/mark-word-icon.png "HTML5 for everyone"

## This does not work as markdown for image is not replaced
<figure>
![My picture][img_01]
</figure>

[img_01]:  http://www.w3.org/html/logo/img/mark-word-icon.png "HTML5 for everyone"

## This work but unwanted 'p' tag is added

<figure  markdown="block">
![My picture][img_01]
</figure>

[img_01]:  http://www.w3.org/html/logo/img/mark-word-icon.png "HTML5 for everyone"
ipeevski commented 8 years ago

A ->transformLine() method that doesn't output <p> tags (similar to Parsedown) would be sufficient IMHO

kenkit commented 7 years ago

Here is what I get <p>Laser-firing underwater drones are being utilized to protect Norway's salmon industry by recognizing, and obliterating, parasitic sea lice</br><br><a href="http://www.foxnews.com/tech/2017/03/23/laser-firing-underwater-drones-protect-norways-salmon-supply-by-incinerating-lice.html">LINK</a></br></p>

How are you guys getting it to parse the [My picture][img_01] tags ? Here am just getting ....

EminezArtus commented 7 years ago

OMG you guys are having such a hard time understanding. Not everyone needs a paragraph tag. I pull data from a database and foreach to parse any markdown but when I do this I get a stupid <p></p> wrapper. Why?? That makes no sense to me, if I need a wrapper I'll put it in my self. Please make the wrapper optional.

taufik-nurrohman commented 7 years ago

I think some people will need this feature for doing like allowing user to write post title markup or post snippet with Markdown syntax so that the output will not have paragraph tag wrapped in the title markup; because title field will be embedded in the <h[1-6]> tags, and description field will be embedded inline in a single <p> tag:

---
title: Foo `Bar` Baz
description: Lorem ipsum **dolor** sit amet.
type: Markdown
...

Lorem ipsum dolor sit amet.

Lorem ipsum dolor sit amet.
<h1 class="title">Foo <code>Bar</code> Baz</h1>
<p class="description">Lorem ipsum <strong>dolor</strong> sit amet.</p>
<section class="content">
  <p>Lorem ipsum dolor sit amet.</p>
  <p>Lorem ipsum dolor sit amet.</p>
</section>
taufik-nurrohman commented 7 years ago

My current solution was using str_replace to remove the paragraph tags manually:

echo str_replace(["</p>\n\n<p>", '<p>', '</p>'], ["\n\n", ""], $content);
silverdr commented 7 years ago

As I wrote over a year ago: "Currently I scan the output, check whether the surrounding <p></p>s are the only ones in the output and strip them if so - works but hurts my eyes, whenever I see it"

taufik-nurrohman commented 7 years ago

@silverdr BTW, scanning the output is simple, just count the number of </p> in the string:

if (false !== strpos($content, '</p>')) { … }
silverdr commented 7 years ago

@tovic It's not difficult. It's IMHO misplaced and "inelegant" but in the absence of a better solution this one works.

michelf commented 7 years ago

I have some trouble figuring out if everyone is asking for the same thing in this thread. The common theme is "no paragraph tags", but what are the expectations beyond the trivial one-liner example?

So I'm going to make a survey. Here's seven inputs; please answer back with the output you'd expect from an hypothetical "no paragraph tag" mode. Also please tell if you don't care about the result for some of these and if so, why.

Input 1:

hello *world*

Input 2:

> hello *world*

Input 3:

hello

*world*

Input 4:

## Hello

*world*

Input 5:

Hello
1. *world*

Input 6:

Hello [world][]
[world]: https://hello.world/

Input 7:

Hello [world][]

[world]: https://hello.world/

Then, let's compare each other's answers.

robsonsobral commented 7 years ago

1, for sure.

6 and 7 are debatable.

taufik-nurrohman commented 7 years ago

My point is to accept inline HTML tags only. So <strong>, <em>, <code>, <abbr> and <a>. <img> is optional.

robsonsobral commented 7 years ago

My point is to accept inline HTML tags only. So <strong>, <em>, <code>, <abbr> and <a>. <img> is optional.

I can live with that.

silverdr commented 7 years ago

I hope the thing I asked for was made clear: an option for the parser that would make it output the HTML without surrounding <p></p> tags. Regardless of whether it's one-line of Markdown or a long document, provided it doesn't break the well-formedness of the output. If the paragraph break is not enforced by the Markdown content, it shouldn't be added. Of course if the Markdown asks for it (double line break) then it has to be honoured. Now the answers (out of my head.. you know how the output looks like for each of the examples now so I only show what I'd expect when the option is available and enabled - possibly as default):

  1. hello <em>world</em>
  2. <blockquote>hello <em>world</em></blockquote>
  3. <p>Hello</p><p><em>world</em></p>
  4. <p><h2>Hello</h2></p><p><em>world</em></p>
  5. Hello 1. <em>world</em>
  6. Hello <a href="https://hello.world/">world</a>
  7. Hello <a href="https://hello.world/">world</a>
peteryland commented 4 years ago

I also have a similar case which might be related. I have the following markdown:

::: test
<input>
:::

And I would like an option for it not to output the unwanted p tags:

<div class="test">
<p><input></p>
</div>
winie commented 1 week ago
  1. hello <em>world</em>
  2. <blockquote>hello <em>world</em></blockquote>
  3. hello <em>world</em>
  4. <h2>Hello</h2><em>world</em>
  5. Hello 1. <em>world</em>
  6. Hello <a href="https://hello.world/">world</a>
  7. Hello <a href="https://hello.world/">world</a>
groovenectar commented 1 week ago
>  1. `hello <em>world</em>`
> 
>  2. `<blockquote>hello <em>world</em></blockquote>`
> 
>  3. `hello <em>world</em>`
> 
>  4. `<h2>Hello</h2><em>world</em>`
> 
>  5. `Hello 1. <em>world</em>`
> 
>  6. `Hello <a href="https://hello.world/">world</a>`
> 
>  7. `Hello <a href="https://hello.world/">world</a>`

Thanks I just added this to my resume

May I never forget you as I'm making the big bucks at my next job

Don't say you never did anything for me

https://c.dup.bz/@wordpress/you%20do%20not%20say.png.html