jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.49k stars 3.38k forks source link

Syntax for divs #168

Closed jgm closed 7 years ago

jgm commented 13 years ago
I've been missing a way to specify a div markdown
without using HTML tags.  The (relatively) new 
delimited code block syntax gave me an idea however.
Consider using a line of four or more periods to
start a div:

    Para before div

    ....
    First para inside div

    Second para inside div

    Third para inside div
    ....

Since one will often want and need to apply an ID
and/or class to a div one should be able specify
those.  I suggest a CSS-like syntax inside braces
after the opening line of periods

    .... {#id .class}
    div content
    ....

One problem would be nested divs.  I suggest marking
those by indenting.  A code/pre block containing div
syntax would need to be doubly indented, similar to 
such blocks inside lists:

    .... {#outer_div}

    Para of outer div

        .... {#nested_div}

        Para of nested div.

        A code block illustrating div syntax
        (a most unusual thing!):

                ....{#example_div}
                This is how you specify a div
                ....

        Another para in nested div

        ....

    Another para in outer div

    ....
markdown
Of course one could mark nested divs by more periods in the
line (and then at least two more!):

    .... {#outer_div}

    Para of outer div

    ...... {#nested_div}

    Para of nested div.

    A code block illustrating div syntax
    (a most unusual thing!):

        ....{#example_div}
        This is how you specify a div
        ....

    Another para in nested div

    ......

    Another para in outer div

    ....

But I think indenting makes things cleaner, making it
easier to remember closing the divs you have opened, and not 
to close more divs than you have opened.  If you have very 
many nested divs you should probably pause and consider why 
anyway!

Google Code Info:
Issue #: 195
Author: bpjonsson
Created On: 2010-01-07T13:54:47.000Z
Closed On: 
jgm commented 8 years ago

+++ Pablo Rodríguez [Mar 28 16 13:29 ]:

I think using the existing square brackets syntax for spans makes
the most sense. So you can have

[1]link{#id .class}
[span]{#id .class}

[2]@jgm, could we consider the discussion syntax for spans (and only for spans) as finished?

Yes, I think so.

mb21 commented 8 years ago

As I said, both syntaxes (side and surround marking) have their pros and cons, depending on the use case, so currently I’m thinking eventually we should have both. But maybe we can start with just one, let’s say side-marking (since I’ve a feeling that that’s still what jgm prefers personally), and roll that one out with the span syntax.

uvtc commented 8 years ago

@bpj wrote:

made this file with an example of each what seems to me to be the remaining contenders, showing {snip}

Thanks, bpj. Though, I don't understand the "lazy" style div syntax with nesting (also with no empty lines between lazy divs). For example, does

; ; {.foo}
Maiores adipisci aut at. Rerum voluptatibus ut beatae sit ullam sit
aspernatur. Temporibus similique aut quidem et alias officiis dolorum.

translate into <div><div class="foo">Maiores adipisci ...</div></div>?

Also it looks like you're closing those lazy-style divs (like delimited syntax), but I don't see why.

BTW I actually changed my mind after having seen them all side by side in several gvim windows.

Which do you prefer now?

One thing I think is noticeable when looking at a doc containing a lot of ! & !!! divs is that it seems like there's too much "yelling" in there. That would grate on me after a while.

ousia commented 8 years ago

[...] But maybe we can start with just one, let’s say side-marking (since I’ve a feeling that that’s still what jgm prefers personally), and roll that one out with the span syntax.

@jgm, would you agree in implementing side-marking syntax for divisions and the syntax for spans in the next release?

bpj commented 8 years ago

@uvtc said:

@bpj wrote:

made this file with an example of each what seems to me to be the remaining contenders, showing {snip}

Thanks, bpj. Though, I don't understand the "lazy" style div syntax with nesting

That's probably because I don't fully understand how lazy style with blockquotes works! :-)

(also with no empty lines between lazy divs). For example, does

; ; {.foo}
Maiores adipisci aut at. Rerum voluptatibus ut beatae sit ullam sit
aspernatur. Temporibus similique aut quidem et alias officiis dolorum.

translate into <div><div class="foo">Maiores adipisci ...</div></div>?

Yes, That's the idea, though normally the outer div would probably have some content before or after the inner div.

Also it looks like you're closing those lazy-style divs (like delimited syntax), but I don't see why.

It wasn't supposed to look that way. What I intended was empty lines at the end of those divs. Again my deficient understanding of laziness in blockquotes showing...

BTW I actually changed my mind after having seen them all side by side in several gvim windows.

Which do you prefer now?

I prefer side marking over fence marking, because nesting gets clearer and because the closing fences in particular look weird. I think the reason that backticks and tildes work as well as they do with fenced code blocks is that visually those characters don't extend more vertically than horizontally. Both bangs and semicolons extend more vertically than horizontally, which makes them ill-suited for fences. Moreover to the extent that I can stand laziness at all I actually think bangs look better with lazy paragraphs because they somehow stand out better.

One thing I think is noticeable when looking at a doc containing a lot of ! & !!! divs is that it seems like there's too much "yelling" in there. That would grate on me after a while.

The only (slight) problem I have with bangs is that I after some 20 years of daily hacking mostly in Perl and using ⚠ U+26A0 WARNING SIGN [1] to mark 'bad' forms in my dictionary work have grown a strong association of preposed bangs with negation. I even use <C-K>!! to enter the warning sign in Vim, having no use for the original meaning of that digraph (pipe). I actually feel that rather strongly when seeing several divs fenced with !!!.

[1]: BTW I dislike the Unicode practice of writing character names in all caps, since that looks like yelling to me.)

bpj commented 8 years ago

@jgm, could we consider the discussion syntax for spans (and only for spans) as finished? Yes, I think so.

Having experimented a bit with a filter which converts links without an URL into spans I've discovered an edge case: pandoc interprets

\foo[Doloremque Sint](){#bar .foo}

as

[Para [RawInline (Format "tex") "\\foo[Doloremque Sint]",Str "(){",Str ".foo}"]]

I.e. as a TeX command with an argument in square brackets followed by literal parentheses, brace, etc. Presumably this would be the same with the proposed span syntax so that

\foo[Doloremque Sint]{#bar .foo}

would become

[Para [RawInline (Format "tex") "\\foo[Doloremque Sint]{.foo}"]]

which would be unfortunate since currently this is a very useful hack when targeting both HTML and LaTeX:

\foo<span class="foo">Doloremque Sint<span>

Would you consider to resolve such cases in favor of the tex followed by span interpretation when the first thing in the braces looks like an id or class? Please note that the TeX argument interpretation always can be forced with

\text{\foo[Doloremque Sint]{#bar .foo}}

There certainly are possible workarounds like filters which overload classes or attributes of a certain form adding the raw TeX before the span, but it would be nice if such were unnecessary.

uvtc commented 8 years ago

+++ @bpj wrote:

I prefer side marking over fence marking, because nesting gets clearer and because the closing fences in particular look weird. I think the reason that backticks and tildes work as well as they do with fenced code blocks is that visually those characters don't extend more vertically than horizontally. Both bangs and semicolons extend more vertically than horizontally, which makes them ill-suited for fences. Moreover to the extent that I can stand laziness at all I actually think bangs look better with lazy paragraphs because they somehow stand out better.

I've suggested that, for generic divs, pandoc-markdown use the same character for side-marking as for delimiting. Maybe I've been mistaken and it's not actually very markdownish (after all, code blocks use 4-spaces for side-marking, but use tildes or backticks for delimiting). Using the same character for both may be sacrificing readability for symmetry.

Consider:

Delimited div coming up.

+++ {.some-class}
The plus sign glyph nicely extends horizontally.

It also suggests the *addition* of
something to what's being delimited here.
+++

Now a side-marked div:

! {.some-class}
! Lorem ipsum dolor sit amet, consectetur
! adipiscing elit, sed do eiusmod tempor
! incididunt ut labore et dolore magna aliqua.
!
! Ut enim ad minim veniam, quis nostrud
! exercitation ullamco laboris nisi ut
! aliquip ex ea commodo consequat.

The above syntax eliminates the (IMO) unsightly !!!, which is my main complaint about the bang.

Also, I've seen that darn +++ 18 times already in this thread. Maybe put it to good use.

jgm commented 8 years ago

+++ Benct Philip Jonsson [Mar 29 16 11:33 ]:

which would be unfortunate since currently this is a very useful hack when targeting both HTML and LaTeX: \fooDoloremque Sint

This really is a hack, and I'm not too inclined to accommodate it. It would be much better to use a filter that intercepts a span with class "foo" and turns it into a corresponding LaTeX command. Indeed, it would be simple to write a general-purposes filter that did this, and even make it available as a binary, so people wouldn't have to reinvent the wheel.

bpj commented 8 years ago

@jgm said

+++ Benct Philip Jonsson [Mar 29 16 11:33 ]:

which would be unfortunate since currently this is a very useful hack when targeting both HTML and LaTeX: \fooDoloremque Sint

This really is a hack, and I'm not too inclined to accommodate it. It would be much better to use a filter that intercepts a span with class "foo" and turns it into a corresponding LaTeX command. Indeed, it would be simple to write a general-purposes filter that did this, and even make it available as a binary, so people wouldn't have to reinvent the wheel.

OK. I have a filter which intercepts divs with classes ending in a period -- which pandoc accepts and which looks rather nice as .foo. with the Pandoc attribute syntax --, and both removes that period and inserts the raw TeX. That feels like a hack to me, but I do want some general mechanism so that I don't have to write a bespoke filter for every set of concerned classes I use. Perhaps more robust would be to have the filter look for a data-tex=foo attribute and (1) insert the raw TeX, (2) add foo to the classes and (3) delete the data-tex attribute. That way you would still get valid HTML when the filter is unavailable, although it's more to type.

ousia commented 8 years ago

@jgm, if you have set the syntax for spans, would it be possible to implement it in the development version?

This would be also great to have language markup (first in HTML and XML codes, then in TeX markup), such as in:

Law translates both [ius]{:la} and [lex]{:la}.
bpj commented 8 years ago

More Catonis instare rem prodesse non credo! :-)

ousia commented 8 years ago

@bpj, if you want to improve communication in an issue that is about to be five years old, I don’t think Latin is the best way to do it.

Sorry, but I’m afraid that my Latin knowledge cannot know what the sentence might mean. I can only guess and I don’t know how it should be applied to my previous comment.

But probably we need longer to even think of deploying what is already settled in this issue (I mean, the syntax for spans).

bpj commented 8 years ago

Which was more or less what I said! Sorry about wearing my jocular hat!

ousia commented 8 years ago

No problem, @bpj, but I’m not sure what you meant. If I have to quote my guessed translation to get the actual one, here you have it: “I don’t believe in the way Cato stands for it would benefit anyone”. (What Cato did and how he did anything are also unknown to me.) The actual translation is much appreciated :smile:.

I don’t really think that we need to wait longer to deploy the span syntax. And I’m not doing it myself, because coding is all Greek to me.

Do you really think we need more time to deliberate again about what is already decided? I would say so, but with Latin quotes I can only guess :wink:.

bpj commented 8 years ago

No, but we should not conflate issues. I believe there are separate issues both for the span synatx and the lang attribute shortcut. Never mind the Latin phrase, it was a bad paraphrase of a badly remembered quote inspired by your use of Latin words as examples.

ousia commented 8 years ago

I know (I opened issue #895 [and reported originally at Google Code more than six years ago ago]) and I agree: it was only a way of showing the real improvement with a dedicated syntax for spans.

saivan commented 8 years ago

I am considering another possible syntax, and I admit that I'm joining this party about five years late, but this exclusion has been driving me a little nuts :P I've even resorted to writing my own regex parser to parse for these particular patterns.

I'm thinking, text goes inside of a block; so why not have something like this:

(.the-class #the-ID)
    || This is some text and some more markdown that should be inside of
    || ## This markdown
    || Sometimes I want (.color)||color to surround my text||, because then
    || it has far more emphasis and is much easier to see

I feel like it's very explicit, and the double vertical lines clearly show where the styles should be applied, maybe a lazy syntax such as this one:

(.the-lazy-way) || 
    This is a block that I can write in a really lazy way because I don't
    want to type out those double lines everywhere, but I have to apply
    spans inside of my code (.example-item)|| like this ||.

May also be a good idea. Finally, if a specific element type is required, for example; if this should have all been housed inside of a div, then I can write:

(.the-lazy-way)[div] || 
    This is a block that I can write in a really lazy way because I don't
    want to type out those double lines everywhere, but I have to apply
    spans inside of my code (.example-item)|| like this ||.

What are the thoughts on that?

uvtc commented 8 years ago

Double pipes look good, but they could be confused for nested line blocks.

But I really do like the idea of using the pipe for side-marking. It's very natural to use | that way. It's so good for side-marking that I don't think Pandoc should limit its use to line blocks...

Pandoc-markdown could add another multi-character markup syntax. It already does so with #. for numbered lists and ### for ATX-style headers.

(I'd actually suggested a 2- and 3-character side-marking syntax a while back for left-, right-, and center-justified text, all of which involve the pipe. See #719.)

Consider:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat.

.| {.some-such}
.| Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
.| eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
.| ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
.| aliquip ex ea commodo consequat.
.|
.| .| {.even-moreso}
.| .| Duis aute irure dolor in reprehenderit in voluptate velit
.| .| esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
.| .| occaecat cupidatat non proident, sunt in culpa qui officia
.| .| deserunt mollit anim id est laborum.
.|
.| Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
.| eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
.| ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
.| aliquip ex ea commodo consequat.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat.
ousia commented 8 years ago

Double pipes look good, but they could be confused for nested line blocks.

Both samples look great, but they are hard to type without a programming editor. Which I think it’s a crazy idea.

(I'd actually suggested a 2- and 3-character side-marking syntax a while back for left-, right-, and center-justified text, all of which involve the pipe. See #719.)

Sorry, but I must be missing something. If Markdown is about logical elements and not about format, why is the proposal about formatting and not about granting the paragraph element attributes (so that it can handle classes)?

uvtc commented 8 years ago

Both samples look great, but they are hard to type without a programming editor. Which I think it’s a crazy idea.

I don't think they're substantially harder to type than any other Markdown syntax.

And, as noted in the above discussion and on the Daringfireball Markdown page, the overriding goal of Markdown is to be as readable as possible (so, readability beats writability). Quoting John Gruber: "The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions."

I think that's why Markdown wins. Apparently, a lot of people are willing to put in slightly more typing effort if it results in a better-looking document.

ousia commented 8 years ago

I don't think they're substantially harder to type than any other Markdown syntax.

They may not be harder than any other side-marking syntax. And I think these marks scare any new user with no technical background.

BTW, it is easier to write <div> tags than the lightweight syntax replacement in your sample. (The lightweight part, no matter which design principles Markdown might aim to follow, isn’t clear to me in this point.)

Side-marking has also another “feature”: what happens when you have to modify the text? Would you have to rewrite (almost) all lines in the paragraph to have a good-looking result?

Sorry, but I think that maybe the solution may be more complicated than the original issue. I mean, I guess any lightweight markup is supposed (at least, in my mind) to make the writing much easier. And this would be great when extended to any average user, not only to people with programming skills.

KurtPfeifle commented 8 years ago

Side-marking has also another “feature”: what happens when you have to modify the text? Would you have to rewrite (almost) all lines in the paragraph to have a good-looking result?

Yes, either that...

..._Or_ use Pandoc to re-format whatever appears as too much laboring:

 pandoc \
      -f markdown \
      -t markdown \
        input-with-sidemarking-not-looking-very-nice.md \
      --columns 100 \
      -o output-with-sidemarking-looking-very-nice.md

Sorry, but I think that maybe the solution may be more complicated than the original issue. I mean, I guess any lightweight markup is supposed (at least, in my mind) to make the writing much easier.

If you want more Markdown syntax features, you'll have to add more syntax elements to Markdown...

If an element is too complicated to use, then don't use it, and limit yourself to those which you find easy enough.

And this would be great when extended to any average user, not only to people with programming skills.

If you are an "average" user without programming skills and want to use it just to write simple documents, why would you want to use many more elements than:

? -- The (very basic) Markdown formatting rules for these seven style elements allow already for quite sophisticated documents. If you want to use more, you have to learn more. And once you learn more, you are starting to develop personally beyond the skills of an "average" user... How far you go, is up to you.

saivan commented 8 years ago

I think using something like the double pipe serves to act like a bracket. It unambiguously shows where the classes should be applied, and it avoids looking "marked up", which can't be said for inline span blocks and inline divs.

On top of that, they could provide an easy way for users who aren't just trying to convert to html to apply styles to their work. For example, pandoc could allow users to apply a blue color to sections marked up like this:

I want the next few words to be (blue)|| blue for a while ||.

Theres no reason why that can't support other markup formats.

ickc commented 8 years ago

Hi, another guy late for the party. It's a very long thread and I've tried hard to read everything but I still might have missed something that has already discussed. So let me write down my assumptions/summary from the thread and see if I misunderstood anything:

Assumptions/summary

Please correct me if I'm wrong on any of the following assumptions:

  1. This new syntax(s) is/are functionally identical to the current native pandoc divs (not mentioned here but seems a fair assumption?)
  2. most of the discussions is around
    1. which kind of syntax to use: fenced vs side-marked
    2. which particular syntax (e.g. symbols) to use
  3. which particular syntax (symbol) to use is not very important, but it's more important to have a certain syntax (probably people find out the opinions are too diverge and there's no clear winner)
  4. which kind of syntax (fenced vs side-marked) to use however is still controversial, some prefer both to be available, but @jgm said:

I think just having side-marked syntax is still on the table. I don't find the arguments for the fenced syntax all that compelling.

Some of my thoughts on the different kinds of syntax

Fenced syntax is already there

If assumption (1) is correct, then a fenced syntax of divs is already in pandoc's syntax since native HTML divs is already native pandoc divs, and HTML divs is fenced.

If so, I think the discussion is then to have a fenced syntax that is markdown-ish rather than plain HTML. And I personally think it is very important for pandoc to have a native syntax for every native features, not HTML syntax only (it is my primary intention to open #2761: the native pandoc small caps syntax is in HTML).

The same point can be argued for or against having a new fenced syntax. To against it, one can say there's already one if you absolutely have to use it. To support it, one would say there's already one so we should improve it by making it native, markdown-ish. Clearly I'm the later kind since I hate having native pandoc syntax being plain HTML.

Another minor thing to mention is, inline vs block. Just like code, sometimes it can be one line, sometimes very long, so it is useful to have both inline and block kinds of syntax. While may divs tends to be long, sometimes it can be very short (an example is when pandoc is used with a CMS). So I think it is useful to have both. But this is really minor and shouldn't waste too much time on this according to assumption (3).

The problems with side-marked syntax

Some of the problems I think side-marked syntax have:

Compatibility With Other Markdown Editor

To me, I need to edit markdown in a markdown editor, not general text editors. For example, one of the important feature I need is ToC. And no matter how powerful some text/code editors are (e.g. Atom, TextMate), they can't give me a ToC panel to have an overview or jump between sections.

My habit is not important at all but just an example. The point is, I guess many pandoc markdown users also edit them in some form of markdown editor (e.g. MultiMarkdown Composer, Macdown, Byword, etc.).

So the problem with side-marked divs is now obvious, e.g. if a heading is in a side-marked divs, most probably the markdown editor is not going to recognize it as a section and show it in the ToC panel. Other similar features e.g. code block, block quotes, etc. will face the same problem.

Note that I'm not saying the HTML rendered by those markdown editor, but the features of those markdown editor within the editor pane, e.g. TOC panel, syntax highlight, etc.

Difficult to Move the Divs Around

More often than other things, the "fence" of divs often get changed and it's very difficult to do that in side-marked syntax. It's really different than, e.g. code block, that when you paste you know in advance the whole thing is a code block and once pasted it can be "forever" settled (and even in this case, code block is not side-marked). Similarly for block quotes (unless sometimes people use block quotes creatively/wrongly as a styling. Another problem of not having a universal divs/spans that target every output format. More on this later).

Conclusion: Fenced Syntax Is More Important, Against Side-Marked Syntax Only

To make it clear, the problem with side-marked syntax only arises if this is the only native pandoc syntax (excluding the HTML non-markdown-ish one). It certainly will be ok if both kind of syntax are provided and the end-users can choose the one appropriate for the situation.

And in my opinion the need of the fenced syntax is bigger than the side-marked one since the fenced one is already there and should be improved by making it markdown-ish, not plain HTML.

Expanding the functionality of native pandoc divs (and spans?)

Regarding the LaTeX Hack, @jgm said:

This really is a hack, and I'm not too inclined to accommodate it. It would be much better to use a filter that intercepts a span with class "foo" and turns it into a corresponding LaTeX command. Indeed, it would be simple to write a general-purposes filter that did this, and even make it available as a binary, so people wouldn't have to reinvent the wheel.

I love the idea of "expanding the reach of pandoc native divs (& spans)". Currently it is not doing anything to the, say, LaTeX output. I personally target HTML and LaTeX generations at once for most of the time.

I'm just brainstorming. For example,

  1. it will be great if the native pandoc divs (& spans) has a standard transformation to LaTeX code and one can control it in the template.
  2. May be an command line option is provided to activate/disable this feature.
  3. There may be a need to turn this feature on/off per use within the document (such that a certain divs/spans are only in the HTML-related output). May be something like markdown=1 kind of syntax?

In my opinion, this is even more important (to expand the functionality of native pandoc divs) than finding a better syntax for it (if assumption 1 is correct).

Another related issue is that, I think the documentation on this needed to be improved. From reading the documentation, one will not know native pandoc divs has no effect on LaTeX generation, for example.

Lastly, I notice none of the above very long discussions used a heading at all. So please pardon me using so much. It is how my brains work and I don't know how else to organize my thought.

ickc commented 8 years ago

This really is a hack, and I'm not too inclined to accommodate it. It would be much better to use a filter that intercepts a span with class "foo" and turns it into a corresponding LaTeX command. Indeed, it would be simple to write a general-purposes filter that did this, and even make it available as a binary, so people wouldn't have to reinvent the wheel.

This might be an example of the kind of functionality/hack needed:

<!-- \begin{flushright} --><div align="right">
text on the right
<!-- \end{flushright} --></div>

The above code in MultiMarkdown would becomes

\begin{flushright}
text on the right
\end{flushright}

in TeX, and

<!-- \begin{flushright} --><div align="right">
text on the right
<!-- \end{flushright} --></div>

in HTML, which are both desirable results.

But there seems no way to do it in pandoc.

jgm commented 8 years ago

+++ ickc [Apr 17 16 21:51 ]:

But there seems no way to do it in pandoc.

You can do it quite easily with a filter.

ickc commented 8 years ago

You can do it quite easily with a filter.

I should have put a parenthesis saying (except with a filter). In other words, the pandoc language can not do that. i.e. there's no toggle that I can only activate part of the codes in a certain output format.

In the case of HTML, some thing like

<div>
blah blah
</div>

will always at least have the blah blah,

and

\begin{abc}
blah again
\end{abc}

can never have the blah again showing up in other output (except when -R is used so that the whole thing is shown).

Even if I don't mind to repeat the whole thing, it cannot be done:

<div align="right">
text on the right
</div>
\begin{flushright}
text on the right
\end{flushright}

This is almost ok except the text inside the div will still show up in the tex output. It is because the way raw html and raw latex handled in pandoc is fundamentally different. An html "begin&end" block will still appears in LaTeX output, but the LaTeX "begin&end" block will disappears (for the whole thing).

I am saying all these because I support your idea of turning the native pandoc divs and spans, through an official filter, to do something in LaTeX. Because it is something that cannot be done currently (without filter). And if the hypothetical filter becomes official, it is not only a filter but a feature in the pandoc language (the pandoc native divs and spans).

e.g. a possible syntax for native pandoc divs and spans to do something in TeX:

<div tex-env="flushright" align="right">
testing
</div>
<span tex-com="textrm">
testing
</span>

becomes

\begin{flushright}
testing
\end{flushright}
\textrm testing

In a certain sense, it is kind of a generalization of <span style="font-variant:small-caps;">Small caps</span>, that a certain native HTML span (and native pandoc span) can becomes a native LaTeX command.

mb21 commented 8 years ago

@ickc something like this is under discussion, see #2542 (although the title of that issue hasn't been changed to reflect the broadened discussion..)

uvtc commented 8 years ago

I'd written:

Double pipes look good, but they could be confused for nested line blocks.

Ah, I was mistaken. Line blocks don't nest (unlike blockquotes (>)). And line blocks require a space after the pipe (unlike blockquotes). So, I think you could use the double pipe as a 2-character side-marking syntax to indicate divs without conflicting with line blocks.

Looks nice.

Feed your goldfish regularly.

|| {.warning}
|| Do not over-feed your goldfish.
|| One child repeatedly over-fed
|| his goldfish until it had to be
|| moved to a pond where it reached
|| a length of 8 meters and weighed
|| over 1100 kg.

Appropriate foods include...
ickc commented 8 years ago

Sorry if it doesn't seem to relate to this thread, but I think Spans and Divs' syntaxes and their functions (in terms of what they will output) should be related.

@jgm

You can do it quite easily with a filter.

@mb21

@ickc something like this is under discussion, see #2542 (although the title of that issue hasn't been changed to reflect the broadened discussion..)

I made a filter in pandocfilters/latexdivs.py at master · ickc/pandocfilters: <div latex="true" class="note abc">...</div> will becomes \begin{note}...\end{note}.

I wish a similar syntax can make into the official pandoc language so that a filter is not needed.

In principle a similar thing can be done for spans (<span latex="true" class="text abc">...</span> will becomes \text{...}), but there's an extra complication that the Span is in a Para.

ickc commented 8 years ago

@chdemko make a filter doing a similar thing: chdemko/pandoc-latex-environment: Pandoc filter for adding LaTeX environement on specific div.

Instead of having latex="true" in the divs like I did, he defines the classes that will be a LaTeX environment through YAML front matter, like this:

---                           
latex-environment:
  test: [class1, class2]
---
<div class="class2 class1">content</div>

will be rendered in LaTeX formatting as

\begin{test}
content
\end{test}

Could @jgm comment on if any of these syntax can make into the official pandoc language?

bpj commented 8 years ago

@ickc if you are prepared to kludge if with HTML comments you might want to try this filter which kludges it with code/code block elements. https://gist.github.com/bpj/e6e53cbe679d3ec77e25

The main advantage is that the LaTeX/other format stuff won't show up in generated HTML (you will have to run the filter when generating HTML though), and you can specify explicitly which/any format each raw code is.

ickc commented 8 years ago

@ickc if you are prepared to kludge it with HTML comments you might want to try this filter which kludges it with code/code block elements. https://gist.github.com/bpj/e6e53cbe679d3ec77e25

I hope you don't mind I added it to the Pandoc Filters · jgm/pandoc Wiki.

It would be very useful for me to include TikZ/PSTricks graphics. I am thinking about using yours together with the Input File scripts in Scripting with pandoc. So that the RAW LaTeX is put in a separate files, and wrote a script to compile that file as a PDF, and use your method to included the generated PDF in HTML but RAW LaTeX in LaTeX.

bpj commented 8 years ago

@ickc wrote:

It would be very useful for me to include TikZ/PSTricks graphics. I am thinking about using yours together with the Input File scripts in Scripting with pandoc. So that the RAW LaTeX is put in a separate files, and wrote a script to compile that file as a PDF, and use your method to included the generated PDF in HTML but RAW LaTeX in LaTeX.

A dedicated filter which detects the output format and does the right thing, including generating the PDF file as needed would probably be easier to work with. As you can see from the usage notes I mainly use the code2raw filter to sneak raw LaTeX code into the preamble through metadata variables. In most cases there are more elegant ways to make content conditional on the output format. The code2raw filter is good for quick one-off hacks though.

bpj commented 8 years ago

@uvtc wrote:

I'd written:

Double pipes look good, but they could be confused for nested line blocks.

Ah, I was mistaken. Line blocks don't nest (unlike blockquotes (>)). And line blocks require a space after the pipe (unlike blockquotes). So, I think you could use the double pipe as a 2-character side-marking syntax to indicate divs without conflicting with line blocks.

Imagine nested divs with that syntax! It would become eyeball-unparsable pretty quickly.

uvtc commented 8 years ago

Imagine nested divs with that syntax! It would become eyeball-unparsable pretty quickly.

Mm. That's a good point. Syntax that nests should look good when nested.

Ok.

Some givens:

But another note about line blocks that's been a bee in my bonnet: I type addresses in emails often, but almost never use line blocks (or lines ending with backslashes) because recipients can't easily copy/paste them. If I were to use line block syntax, folks would wonder why the heck I'm adding those funny characters in when I type addresses. It's not markdownish. Line-blocks could really use a delimited syntax.

Consider:

Hi.

| A side-marked div with no
| attributes. Uses a side-marking
| that looks like a "single line".

Lorem ipsum.

| {.some-class}
| A side-marked div with an attribute.
| This seems like a natural way to
| write this.

Lorem ipsum.

--- {.some-class}
A delimited syntax for divs. Also uses a
"single-line" marking, like `|`, but
horizontal. I think this may be the most
obvious markdownish syntax for a generic
delimited container.
---

Lorem ipsum.

|| A side-marked syntax for line
|| blocks. Uses a "*double* vertical
|| line" side-marking to emphasize,
|| "these are lines in a line block".

Lorem ipsum.

===
A delimited syntax for line blocks.
Note, it also uses a double-line,
like `||`, but horizontal.
===

Done.

Of course, this breaks backcompat. And would also require pandoc to be more strict about rules for setext headers (require a blank after) and horizontal rules (require blank lines before and after).

@jgm, what do you think of the possibility of a "pandoc_markdown_strict" mode that:

(which, I'd like to think, could enable the div and newer lineblock syntax as described above)?

*

Added bonus regarding using double-pipe for line blocks: it would go nicely with some proposed suggestions (by yours truly) for future right-align and center-align syntax (#719) --- all three of which would (1) start with a pipe, (2) consist of multiple characters, (3) and keep each line on its own line:

|| line block
|| here

|>           right align
|>                  here

|<>   center align
|<>       here
uvtc commented 8 years ago

Another place I just saw where ------ is commonly used to mark what is practically a div:

> > Will you make it?
>
> Ok, I'll be there!

Great, see you this weekend!

-- John

---------------------------
This email and the attachments herein
are copyright the sender, and in no way
suggest or guarantee that the sender will
be corporeally or temporally available
this weekend, or any other weekend,
for any planned or unplanned activities
with this or any other recipient.
---------------------------
uvtc commented 8 years ago

Another place where I often find I need a delimited divs syntax:

I went to the url you told me about, but didn't understand
the instructions. It said:

----------------------------
Dear Applicant,

Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore
magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.

Duis aute irure dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur.

Completed forms due by Oct 31 or Dec 25!
----------------------------

Should I send the forms in on Hexember 19?

I could imagine maybe wanting to use some kind of delimited blockquote syntax for that, but it's not really quoting someone per se, but rather copying/pasting content from elsewhere.

I've seen Python programmers sometimes use triple quotes for this, but I think triple quotes are unpleasant to look at.

I think side-marking with pipes would look best here:

I went to the url you told me about, but didn't understand
the instructions. It said:

| Dear Applicant,
|
| Lorem ipsum dolor sit amet, consectetur adipiscing elit,
| sed do eiusmod tempor incididunt ut labore et dolore
| magna aliqua.
|
| Ut enim ad minim veniam, quis nostrud exercitation
| ullamco laboris nisi ut aliquip ex ea commodo consequat.
|
| Duis aute irure dolor in reprehenderit in voluptate velit
| esse cillum dolore eu fugiat nulla pariatur.
|
| Completed forms due by Oct 31 or Dec 25!

Should I send the forms in on Hexember 19?

but again, for practicality, there are times when you want to copy/paste a large block quickly, and it may even be one that you expect others to want to subsequently copy/paste (as plain text) and not have to deal with side-markings (analogous to the situation with delimited code block syntax).

alerque commented 8 years ago

@uvtc Please note there is already a | delimited block format in Pandoc that changes how white space is handled. That particular example of yours is also an exact match for using the existing block quote syntax as that's semantically what that content is.

This issue that needs to be addressed here is for content that has some attribute other than being a simple block-quote. For that using an open and closing block format (with attributes of course) seems like a much more versatile solution than a left delimited option.

uvtc commented 8 years ago

@alerque , I understand that there is already a side-marked | syntax, and that it's used in Pandoc for line-blocks. I'm proposing that it's worthwhile in this case to consider breaking backward compatibility in the following ways:

My rationale for those changes is given in my most recent 3 posts here, and is also partly distilled from the years of comments in this thread. My hunch is that

From what I've seen, @jgm has historically been thoughtful in making changes/additions to pandoc-markdown syntax, and disinclined to break backcompat. Also, this thread has been going for quite some time, and folks are probably very tired of discussing it. On the bright side, I haven't yet run out of jokes to include in my sample pandoc-markdown blocks. ;)

ousia commented 8 years ago

I would like to know whether there is any progress related to this issue.

Sorry, but after so much time, the implementation has reached this conclusion (click on the image to watch a >9s clip):

The sound of inevitability

A question and a comment:

sergiocorreia commented 8 years ago

+1 on the [[[ syntax:

Finally, I also agree with @ousia in that brackets are better but colons are also fine and definitely better than nothing.

[[ center
This text is centered

[[[ alert
And this is an *alert*
]]]

This is also centered
]]
bpj commented 8 years ago

@sergiocorreia +1 for having inner divs use more markers than outer divs.

Seeing your example it is at least clear where each div starts and ends. I still think there should be the option of putting a comment after the closing marker though.

However I also see that many square brackets in a row are still jarring to my eyes. Two is about as much as I can take, three is beginning jar.

So what about a compromise (knowing well that compromises tend to make noone happy...)

[[::: center
This text is centered

[[::::: alert
And this is an *alert*
:::::]]

This is also centered
:::]]
sergiocorreia commented 8 years ago

Looks nice but the main problem I see is that it requires 10-12 characters for each div. Maybe we can make the colons optional?

bpj commented 8 years ago

Well, I guess we only need one bracket and a minimum of two colons. I tend to use a lot of backticks on my codeblocks because I have a key sequence which inserts two backticks with the cursor between them, so I hit it at least twice, or three to six times if I expect there to be any backticks inside the codeblock. It isn't the actual number of chars so much as how they look and how easy they are to type.

Come to think of it: when brackets indicate nesting there is never any need to type more colons than the minimum!

[:: center

This is centered.

[:: alert
   This is an *alert*
::]

This is still centered.
::]
ousia commented 8 years ago

@sergiocorreia and @bpj, just a single detail.

As far as I knew, I thought that the syntax for #identifiers and .classes was set in stone.

It is already in use in the few elements that allow attributes (such as titles).

Do you also want to discuss the syntax for attributes or had I misread something in your samples?

sergiocorreia commented 8 years ago

@ousia you are right, but code blocks also have an alternative to {.classname} in:

    ~~~ classname
    lorem ipsum
    ~~~

For simplicity, I think we should just keep the same convention as code blocks and allow class names with both the shortcut and brackets.

In any case, I like the [[ , :: and [:: notations for divs. We can also look at what others are doing (e.g. asciidoc has several types of delimiter blocks, for literal text, source code, etc.)

ousia commented 8 years ago

@sergiocorreia, I didn’t know that.

It’s perfect for me to have alternative versions for the attribute syntax.

With the markup characters themselves, maybe the whole issue would be to have a timeline for implementing the span and division syntax.

Otherwise, we are discussing this ad infinitum (or “To infinity... and beyond!” :wink:).

sergiocorreia commented 8 years ago

After re-reading everything and going through the commonmark discussion, I would just go with :::

It would also be easy to implement, as it would be the same as ~~~ except that we parse the contents as markdown and not as source code.

ousia commented 8 years ago

@sergiocorreia, I’m not enthusiastic about :::, but to have the feature implemented is essential to be able to remove XML markup in Markdown documents.

But the final decision comes from @jgm and we have to wait for his final move on this issue.