jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.29k stars 3.37k forks source link

Permit adding attributes to all Markdown elements #684

Open phyllisstein opened 11 years ago

phyllisstein commented 11 years ago

I'd like to suggest that Pandoc allow attributes to be attached to any Markdown element, not just code blocks. So for example, if it encountered this:

#{.main} Header

it would generate this:

<h1 class="main" id="header">Header</h1>

rather than this:

<h1 id="main-header">{.main} Header</h1>

Ditto for, say:

Header
----{.main}

...which is probably even legible enough to fit with Markdown's philosophical aversion to looking, well, marked-up.

jamiefolson commented 11 years ago

FYI conversations on this topic date back 5 years:

I for one think it's surprising that nothing seems to have been done to facilitate this. I understand the desire not to pollute the syntax, but it seems like people are taking extraordinary measures to work around this limitation.

jgm commented 11 years ago

Would you count 1.11's allowing attributes to be added to headers as progress?

phyllisstein commented 11 years ago

John, I thought that was a terrific step, along with the inline_code_attributes extension. I'm looking forward to when classes and key/value pairs are implemented for headers, since the particular use case I had in mind involved setting an identifier and an onClick event.

Thanks so much for all your work on Pandoc; it's really a thing of beauty.

jgm commented 11 years ago

+++ Daniel Shannon [Mar 15 13 18:04 ]:

John, I thought that was a terrific step, along with the inline_code_attributes extension. I'm looking forward to when classes and key/value pairs are implemented for headers, since the particular use case I had in mind involved setting an identifier and an onClick event.

They are already implemented in 1.11!

% pandoc
# Hi {#foo .bar .baz key=val}
^D
<h1 id="foo" class="bar baz" key="val">Hi</h1>
jamiefolson commented 11 years ago

I think attributes for headers are a great and wonderful thing. I also think that there are totally valid reasons for wanting to add attributes to arbitrary elements. For example, in the reports package for R, the author, possibly intimidated by haskell, resorted to a series of fragile regular expressions to try add the necessary classes and attributes to html output.

Certainly, the "fragment" class for reveal.js could and probably should be set by a new output format for pandoc, however, it's a lot harder to come up with a solution for the transition options. Both Beamer and reveal.js allow the user to configure how and when transitions occur (I don't know about all the other js templates).

Custom attributes would make it simple for reveal.js and even for beamer, the necessary script would not be complicated. Conceptually, such concepts as how and when to reveal an element feel like options rather than some new syntactic element. Alternative solutions would seem to require awkward detection and parsing of (to pandoc) literal string elements.

luc-j-bourhis commented 11 years ago

Imho tables are in dire need of that new syntax for attributes. It is not rare to need different tables to be displayed with different styles.

thriveth commented 10 years ago

And there are different LaTeX figure-environments. Allowing to set custom attributes to an image tag would mean one could more easily distinguish different kinds of figure. Other writers could simply ignore it.

jgm commented 10 years ago

@jamiefolson: pandoc now includes a uniform syntax for transitions, which gets output as \pause in beamer and using fragment divs in revealjs. Of course, you can also just use a <div class="fragment"> in the markdown source, but this won't be portable if you decide to switch to beamer.

@luc-j-bourhis, @thriveth: You can wrap a table or figure with a div that has attributes. Are there reasons the attributes have to be on the table or image itself?

thriveth commented 10 years ago

@jgm I wasn't aware of that, but looking at #1242 I can see it is possible, but also witht he caveats that @blaenk mentions there. I understand that it would be very work heavy and difficult to implement without ambiguities and messy syntax, so I second @blaenk's suggestion to implement an attribute field to the image element (because I find I almost always need to add some kind of metadata/attribute to it) and leave the rest to div/span solutions.

mb21 commented 9 years ago

With image attributes underway, I could take a look at adding attributes to a few more elements if this is desired:

While the HTML writers would pass through all attributes, the others would just support the id and classes where appropriate. So beyond referencing tables and blockquotes, it would primarily enable simpler filters.

The exact markdown syntax for tables and blockquotes isn't exactly obvious, but I tend to favour the variant where the attributes have to be on their own line, trailing the actual table or blockquote (see the commonmark discussion and proposal).

thriveth commented 9 years ago

This is really great news! Probably means I will soon be able to write journal article fully in Pandoc Markdown without or almost without manual editing in LaTeX.

On 12/30/2014 01:42 PM, mb21 wrote:

With image attributes underway https://github.com/jgm/pandoc/pull/1806, I could take a look at adding attributes to a few more elements if this is desired:

While the HTML writers would pass through all attributes, the others would just support the |id| and |classes| where appropriate. So beyond referencing tables and blockquotes, it would primarily enable simpler filters.

The exact markdown syntax for tables and blockquotes isn't exactly obvious, but I tend to favour the variant where the attributes have to be on their own line, trailing the actual table or blockquote (see the commonmark discussion http://talk.commonmark.org/t/consistent-attribute-syntax/272/ and proposal https://mb21.github.io/stmd/spec.html#extensions).

— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/684#issuecomment-68353338.

jamiefolson commented 9 years ago

This would be fantastic. Even if attributes are mostly only available in the data model for filtering that enables a large number of currently difficult use cases.

Jamie Olson

On Tue, Dec 30, 2014 at 8:59 AM, Thøger Rivera-Thorsen < notifications@github.com> wrote:

This is really great news! Probably means I will soon be able to write journal article fully in Pandoc Markdown without or almost without manual editing in LaTeX.

On 12/30/2014 01:42 PM, mb21 wrote:

With image attributes underway https://github.com/jgm/pandoc/pull/1806, I could take a look at adding attributes to a few more elements if this is desired:

While the HTML writers would pass through all attributes, the others would just support the |id| and |classes| where appropriate. So beyond referencing tables and blockquotes, it would primarily enable simpler filters.

The exact markdown syntax for tables and blockquotes isn't exactly obvious, but I tend to favour the variant where the attributes have to be on their own line, trailing the actual table or blockquote (see the commonmark discussion http://talk.commonmark.org/t/consistent-attribute-syntax/272/ and proposal https://mb21.github.io/stmd/spec.html#extensions).

— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/684#issuecomment-68353338.

— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/684#issuecomment-68358079.

stroobandt commented 9 years ago

This issue is also listed as item 2 on this list. (Disclaimer: I have no relation to that web site.)

fmatheus commented 9 years ago

Would be great if any _headerattributes was supported in beamer output. Looks like {.allowframebreaks} and .fragile by test made in 1.15.0.6. But why not support arbitrary attributes? In particular c, noframenumbering and plain.

jgm commented 9 years ago

+++ fmatheus [Oct 14 15 15:05 ]:

Would be great if any header_attributes was supported in beamer output. Looks like {.allowframebreaks} and .fragile by test made in 1.15.0.6. But why not support arbitrary attributes? In particular c, noframenumbering and plain.

Sounds plausible to me. This just takes a small change in the LaTeX writer.

jgm commented 9 years ago

+++ fmatheus [Oct 14 15 15:05 ]:

Would be great if any header_attributes was supported in beamer output. Looks like {.allowframebreaks} and .fragile by test made in 1.15.0.6. But why not support arbitrary attributes? In particular c, noframenumbering and plain.

I've added support for all frame attributes in commit 504bf3f8e79bd502f406264e2cc2794b129a26c0

stroobandt commented 9 years ago

Sounds plausible to me. This just takes a small change in the LaTeX writer.

Attributes to all Markdown elements is not only useful to LaTeX and Beamer. I personally can think of many use cases for it as XHTML classes.

jgm commented 9 years ago

+++ Serge Y. Stroobandt [Oct 16 15 06:24 ]:

Sounds plausible to me. This just takes a small change in the LaTeX
writer.
Attributes to all Markdown elements is not only useful to LaTeX and
Beamer.
I personally can think of many use cases for it as XHTML classes.

Sure. I was responding to the (misplaced) previous comment which just concerned Beamer attributes on headers, not to this general issue.

jmuheim commented 8 years ago

@jgm:

% pandoc
# Hi {#foo .bar .baz key=val}
^D
<h1 id="foo" class="bar baz" key="val">Hi</h1>

How can an ID be applied to a link? The following doesn't work:

[Als DOCX downloaden](somewhere.html) {#download_as_docx}
jgm commented 8 years ago

You need the very latest dev version of pandoc to apply an ID to a link (compile from source). And you can't have a space before the {.

+++ Joshua Muheim [Nov 25 15 14:38 ]:

[1]@jgm: % pandoc

Hi {#foo .bar .baz key=val}

^D

Hi

How can an ID be applied to a link? The following doesn't work: Als DOCX downloaden {#download_as_docx}

— Reply to this email directly or [2]view it on GitHub.

References

  1. https://github.com/jgm
  2. https://github.com/jgm/pandoc/issues/684#issuecomment-159747313
ousia commented 8 years ago

@jgm, will all elements have attributes in Markdown?

jmuheim commented 8 years ago

@jgm, which version is this? I have 1.15.2.1, which came with homebrew.

ousia commented 8 years ago

It comes in version 1.16.x. Latest released version is 1.16.0.2.

elotroalex commented 8 years ago

Any updates on this? We're trying to create a workflow that will produce a web version and a print version from the same markdown file. We're using Jekyll for web deployment. Jekyll usually plays nice with kramdown, but @mfenner has been working on a gem called jekyll-pandoc. We've been knocking our heads silly trying to decide how to handle poetry. (We're lit folks). At issue right now is the inability to handle classes at the line level so we can make the right CSS to wrap our lines nicely, etc. Kramdown handles this exceptionally well, but then we'll be giving up our workflow for pandoc | *TeX which depends on pandoc-markdown. If we can get pandoc-markdown to work well with Jekyll + poetry, we will have devised an excellent solution for a large community of editors and scholars who can now produce nice PDF's and nice websites out of the same files. We're trying at all costs to avoid to have to write filters, or divide the editing workflow into two parts. Can we get attributes at the unnumbered list line and block level, and its corollary the blockquoted unnumbered list? Or is this out with the new implementation and I missed something?

ousia commented 8 years ago

I think the basic issue is the following:

If Markdown is based on HTML, Markdown should have three basic attributes in all elements.

And sorry, if LaTeX cannot handle this, we should find another way of dealing with XML when using LaTeX.

mb21 commented 8 years ago

We have now Attr (i.e. attribute support) also on Image and Link elements, as can be seen on the pandoc-types Definition.hs. To add it to further elements, we'll have to change the types and then change all the writers, which takes a lot of work and is a breaking change for filters etc. Still, I hope we'll be able to add it to more elements in the future... contributions welcome :)

elotroalex commented 8 years ago

Thanks, @mb21. Good to hear you're still moving on this. Contributions might be forthcoming once I find a friend who can do Haskell (or time to learn myself). I see you have this generic

| Span Attr [Inline] -- ^ Generic inline container with attributes

I can't parse haskell very well, so I apologize if this sounds dumb. What is this generic inline container?

mb21 commented 8 years ago

It's literally the native pandoc span element (there's also a div element). Unfortunately there's no markdown syntax yet (but most probably it'll be [my text]{.myClass})—meanwhile you can write inline HTML:

echo 'some _italic_ markdown with <span class="myClass">my text</span>' | pandoc -t native

The Attr means it holds attribute and the [Inline] means a list of Inline elements. see http://learnyouahaskell.com/making-our-own-types-and-typeclasses#algebraic-data-types for an introduction into haskell data types (the whole pandoc internal document AST is such a data type).

elotroalex commented 8 years ago

Thanks, @mb21. This is very useful. We've decided to use some HTML tags until further notice to deal with poetry. Here's looking forward to having both this new [my text]{.myClass} syntax and the coveted attribute class for any element.

Thanks for the link to this tutorial also. It seems very complete and approachable. I got the gist of the data types right away. Perhaps it will be me or someone in my team who implements, after all.

jgm commented 8 years ago

You know about line blocks, right?

http://pandoc.org/README.html#line-blocks

This will give you control over line breaking and initial indentation, without sacrificing source readability.

If you want more control, you can wrap a line block in a div with a class, and you can wrap individual lines in a span.

elotroalex commented 8 years ago

Hi, @jgm. Sorry I missed you the last time you came to Columbia. @denten tells me it went really well.

So, yes, we know about the line blocks. We're juggling between those and unnumbered lists. Thanks for incorporating that syntax, btw. It is definitely useful. We were butting heads against the line wrap issue, that we could easily wrangle with the {.class} syntax, and also right now we're having some issues with ConTeXt playing nice with the line blocks. I feel we're very close to solving the ConTeXt problem, so that would only leave the line wrap problem for us. And of course, we're very aware that we can solve almost everything right now using the <div> and <span> that you provide. Our ultimate goal though is to reduce the syntax as much as possible for our editors and scholars, and achieve Gruber's dream of keeping all text relatively readable in the markdown itself.

Thanks again, for jumping in.

elotroalex commented 8 years ago

Hm. Maybe if I show you how I was solving this problem using kramdown/jekyll for my Ed project so the thread can get a clearer sense of the need:

- O Captain! my Captain! our fearful trip is done;
- The ship has weather’d every rack, the prize we sought is won,
- The port is near, the bells I hear, the people all exulting,
- While follow eyes the steady keel, the vessel grim and daring; 
- {:.indent-3}But O heart! heart! heart!
- {:.indent-4}O the bleeding drops of red,
- {:.indent-5}Where on the deck my Captain lies,
- {:.indent-6}Fallen cold and dead.

After it's processed by the kramdown engine this renders beautifully, and the lines wrap on smaller screens.

jgm commented 8 years ago

What an ugly way to write a poem! Here's how you'd do it in pandoc's Markdown:

| O Captain! my Captain! our fearful trip is done;
| The ship has weather’d every rack, the prize we sought is won,
| The port is near, the bells I hear, the people all exulting,
| While follow eyes the steady keel, the vessel grim and daring; 
|       But O heart! heart! heart!
|          O the bleeding drops of red,
|            Where on the deck my Captain lies,
|              Fallen cold and dead.
jgm commented 8 years ago

I see what you mean about wrapping; on small screens you want the lines to wrap with some indentation, and the above doesn't do that. Well, since you're comfortable abusing unordered lists, you could always use nested lists to get the indentation you need:

- O Captain! my Captain! our fearful trip is done;
- The ship has weather’d every rack, the prize we sought is won,
- The port is near, the bells I hear, the people all exulting,
- While follow eyes the steady keel, the vessel grim and daring; 
    - But O heart! heart! heart!
        - O the bleeding drops of red,
            - Where on the deck my Captain lies,
                - Fallen cold and dead.

With appropriate CSS, this could behave just like what you have.

Markdown is meant to be readable as it stands. Explict attributes should be used only when necessary. I think that both of the methods I've suggested give you something that's much more readable in the source. The first method has the advantage of working well in all output formats, without special CSS (though you don't get as nice behavior on small screens).

If you really want to use ugly stuff, though, you could do something like:

- <span class="indent-3">But O heart! heart! heart!</span>

and so on. A little more to write, but it will give you Markdown that works everywhere.

elotroalex commented 8 years ago

Ah, yes I know. It is a fantastic way of writing it in pandoc's Markdown. The only problem is having the lines wrap properly on small screens or big fonts, given that this produces a <p> tag for the stanza, with line ends being <br/>. We're working out the possibilities. Our ideal situation would be to be able to wrap these lines in line-blocks with hanging indentation using the syntax as-is, or the plan B, not as elegant, but elegant enough, {.foo} solution that you offer, as opposed to kramdown's. (which is at least better than kramdown's:

- O Captain! my Captain! our fearful trip is done;
- The ship has weather’d every rack, the prize we sought is won,
- The port is near, the bells I hear, the people all exulting,
- While follow eyes the steady keel, the vessel grim and daring; 
- But O heart! heart! heart! {.indent-3}
- O the bleeding drops of red, {.indent-4}
- Where on the deck my Captain lies, {.indent-5}
- Fallen cold and dead. {.indent-6}
jgm commented 8 years ago

By the way, I've long thought that we should have a dedicated block element for line blocks. (Currently there is no distinctive representation of line blocks in the AST; instead, line blocks are parsed as paragraphs with line breaks.)

This would be good for your purposes, as you could use a filter or custom renderer to customize the output. (And maybe the default renderer could have a different output than a <p>.) Perhaps an issue should be created for this. It would be a big change, as it would require a change to pandoc-types, and changes in all writers and readers. So, it couldn't happen soon.

elotroalex commented 8 years ago

Yes! Block elements would be fantastic. At least to avoid the ugly <div class="stanza">. You have my vote.

The sub-ordered list solution is promising. Our only snag would be to design for a collection of poetry where the indents would be irregular across poems. Let me play with a bit and see if that does it.

elotroalex commented 8 years ago

Oh, did you want me to create the issue? Like I mentioned above, I'm hoping to dedicate some of my resources to this in the future.

ousia commented 8 years ago

By the way, I've long thought that we should have a dedicated block element for line blocks. (Currently there is no distinctive representation of line blocks in the AST; instead, line blocks are parsed as paragraphs with line breaks.)

That would be great.

ghost commented 8 years ago

Another important use case for attributes is in emphasis and strong, for example: *homo sapiens*{.zool}. Useful for making indexes, ecc.

ousia commented 8 years ago

Another important use case for attributes is in emphasis and strong, for example: *homo sapiens*{.zool}. Useful for making indexes, ecc.

This is essential to have right hyphenation in foreign languages, such as in:

_homo sapiens_{:la}
ghost commented 8 years ago

@ousia Also to selectively transliterate, for example if I mention greek words in greek alphabet but I want to add an option to show them transliterated, or generate the transliteration immediately after the original while building

jmuheim commented 8 years ago

It comes in version 1.16.x.

I soon will get my server updated, and pandoc 1.16 will be on it, but I don't know which sub-version (1.16.0, 1.16.1, etc.).

Is it possible in version 1.16.0 already? I ask because I'm developing some features for a project which heavily relies on this feature, and I must know whether the new server will support it.

jgm commented 8 years ago

No version of pandoc allows adding attributes to all Markdown elements. They can be added to headers, code blocks and spans, images, and links only.

+++ Joshua Muheim [Mar 23 16 08:15 ]:

It comes in version 1.16.x.

I soon will get my server updated, and pandoc 1.16 will be on it, but I don't know which sub-version (1.16.0, 1.16.1, etc.).

Is it possible in version 1.16.0 already? I ask because I'm developing some features for a project which heavily relies on this feature, and I must know whether the new server will support it.

— You are receiving this because you were mentioned. Reply to this email directly or [1]view it on GitHub

References

  1. https://github.com/jgm/pandoc/issues/684#issuecomment-200389302
jgm commented 8 years ago

And divs and spans.

jmuheim commented 8 years ago

It's absolutely ok for me if I can add them to heading elements (e.g. h1).

And is this possible in 1.16.0?

jgm commented 8 years ago

Yes.

+++ Joshua Muheim [Mar 23 16 08:26 ]:

It's absolutely ok for me if I can add them to heading elements (e.g. h1).

And is this possible in 1.16.0?

— You are receiving this because you were mentioned. Reply to this email directly or [1]view it on GitHub

References

  1. https://github.com/jgm/pandoc/issues/684#issuecomment-200393390
jmuheim commented 8 years ago

Hoorraaayyy!!! :heart:

And how can I create divs and spans from within Pandoc-Markdown?

mb21 commented 7 years ago

Reposting here some of my thoughts from an old pandoc-discuss thread. I still think adding adding attributes to all elements is a viable option for pandoc 2.0...

It feels wrong somehow to keep adding Attr to more and more elements.

I know what you mean, but maybe it's just because Attr is kind of a hacky type in itself (a three-tuple and not even newtype). But is there a better alternative to adding Attr to more things? i.e. what would be ideal?

Maybe Attr should ultimately be a GADT with record syntax (GADTs are a GHC-extension that effectively provide subtyping, see https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/data-type-extensions.html#gadt). Something like:

data Attr where
  GeneralAttr :: { uid    :: String              -- unique identifier
                 , cls    :: [String]            -- classes
                 , others :: [(String, String)]  -- key-value pairs for filters etc
                 } -> Attr
  ImageAttr   :: { uid    :: String
                 , cls    :: [String]
                 , width  :: Dimension
                 , height :: Dimension
                 , figure :: Bool
                 , others :: [(String, String)]
                 } -> Attr
  CodeAttr    :: { uid    :: String
                 , cls    :: [String]
                 , lang   :: String
                 , others :: [(String, String)]
                 } -> Attr
  HeaderAttr  :: { uid    :: String
                 , cls    :: [String]
                 , numbered :: Bool
                 , others :: [(String, String)]
                 } -> Attr
  deriving (Show)

nullAttr :: Attr
nullAttr = GeneralAttr "" [] []

-- sample functions

getUid :: Attr -> String
getUid attr = uid attr

getClass :: Attr -> [String]
getClass = cls

[...]

it's probably a tradeoff away from the flexibility of the list of string tuples (which permits arbitrary key value pairs without breaking the API for anyone) towards using Haskell's type system even more and embedding the semantics of the attributes directly in the types. I'd certainly prefer GADTs from a theoretical point of view, but seeing how hard it is to change pandoc-types I'm not so sure anymore (though future changes to, say, the image attribute would only affect users that make use of the ImageAttr constructor, thus being much more limited in scope). Finally, if we were to stick with dumb key value pairs, should we at least make it a HashMap?


@jgm mentioned recently somewhere that we should at least convert Attr from a type to a newtype. That means changing (ident, cls, kvs) to Attr ident cls kvs in lots and lots of places, or is there an unorthodox way around this?


Basically, the question still is: what would be the optimal approach of handling attributes? So we can get it as right as possible this time.

jgm commented 7 years ago

+++ Mauro Bieg [Jan 23 17 00:58 ]:

Reposting here some of my thoughts from an old [1]pandoc-discuss thread. I still think adding adding attributes to all elements is a viable option for pandoc 2.0...

The question is how to do this without super-extensive and painful code changes in the entire pandoc code base, and without making things that were simple before complicated.

Matthew Pickering has suggested we could use pattern synonyms to keep things simple, and I haven't looked into that much.

A newtype wrapping a HashMap would make sense, I suppose, but one advantage of the present representation is that it's very easy to pattern match.