msprev / panzer

pandoc + styles
BSD 3-Clause "New" or "Revised" License
160 stars 13 forks source link

panzer treats format+/-extension as a separate format #43

Open bpj opened 6 years ago

bpj commented 6 years ago

While I know that you don't think that panzer should concern itself with the arguments to pandoc's -r and -w options (or their synonyms) there is one aspect of them which panzer probably needs to at least be enough aware of to ignore: pandoc extensions. Currently panzer 1.3 considers e.g. html and html-smart as two different output formats, totally ignoring styles for html when running with -w html-smart. As you can see from the example at the end of this post it can be exploited, but most of the time it is just plain annoying. There is really no workaround except duplicating styles in YAML files albeit eased a bit by the use of YAML anchors and references. When using multiple extensions it gets hard to handle, because you must use the exact same extensions, specified in the exact same order each time! Arguably this is due to a design flaw in pandoc: rather than appending these cute +this-that suffixes to the reader/writer name there should have been proper options -X, --reader-extension +ENABLED and -x --writer-extension -DISABLED. I can see three and a half solutions to this:

  1. Just ignoring all extensions in the -w option argument when looking up styles. This is entirely acceptable but IMO a bit too limiting.

  2. Parsing out the extensions part of -w option arguments and format keys in styles alike, split it into format and individual extensions, sort sextensions ASCIIbetically, join them together again and do lookup based on that. This would however be messy and confusing, and what if a style contains several keys with the same format and extensions but in different order?

  3. Let panzer users specify extensions in styles as lists:

    1. Pretend that those --reader-extension and --writer-extension options actually exist, letting panzer users specify them in styles just like other options and then append them to the reader/writer format. This would work very well and might motivate jgm to actually provide those options in pandoc. Alas there is the possibility, however slight, that like-named options with other functions be introduced in pandoc.

    2. Better then to introduce an extensions field in styles:

      Fancy:
        markdown:
          extensions:
            reader:
              on:
                - '`four_space_rule`'     # Keep my sanity!
                - '`emoji`'               # One of the few other off-by-default extensions I might consider turning on!
              off:
                - '`tex_math`'            # Strictly empirical linguistics and humanities here!
      
      Simple:
        parent: Fancy
        markdown:
          extensions:
            writer:
              on:
                - '`smart`'               # Which actually means "NOT smart"! :-(
                - '`ascii_identifiers`'
              off:
                - '`bracketed_spans`'
                - '`fenced_divs`'

      The on/off level rather than +/-foo in the lists is of course not strictly needed, but would greatly help me for one to see at a glance what gives!

I'll fully understand, and be quite happy, if you opt for 1. for the time being (format = re.match('\w+', options['write']) or 'html' and call it a day!) but please keep 3.ii. in mind for the future!


Example:

~/Dokument/bugs/panzer/extensions$ cat example.md 
---
style: Foo
styledef:
  Foo:
    html:
      commandline:
        number-sections: true
    html-smart:
      commandline:
        number-sections: false
---

# Primum

## Primum "Alpha"

## Primum "Beta"

# Secundum

## Secundum "Alpha"

# Tertium

## Tertium "Alpha"

~/Dokument/bugs/panzer/extensions$ panzer example.md -w html
         ----- pandoc read -----
         running
         ----- style definitions -----
         global:
           <classified>
         document:
           Foo        
         ----- document style -----
         style:
           Foo
         full hierarchy:
           Foo
         writer:
           html
         ----- run list -----
           empty
         ----- pandoc write -----
         pandoc writing with options:
           --number-sections
<h1 id="primum"><span class="header-section-number">1</span> Primum</h1>
<h2 id="primum-alpha"><span class="header-section-number">1.1</span> Primum “Alpha”</h2>
<h2 id="primum-beta"><span class="header-section-number">1.2</span> Primum “Beta”</h2>
<h1 id="secundum"><span class="header-section-number">2</span> Secundum</h1>
<h2 id="secundum-alpha"><span class="header-section-number">2.1</span> Secundum “Alpha”</h2>
<h1 id="tertium"><span class="header-section-number">3</span> Tertium</h1>
<h2 id="tertium-alpha"><span class="header-section-number">3.1</span> Tertium “Alpha”</h2>

~/Dokument/bugs/panzer/extensions$ panzer example.md -w html-smart
         ----- pandoc read -----
         running
         ----- style definitions -----
         global:
           <classified>
         document:
           Foo        
         ----- document style -----
         style:
           Foo
         full hierarchy:
           Foo
         writer:
           html-smart
         ----- run list -----
           empty
         ----- pandoc write -----
         running
<h1 id="primum">Primum</h1>
<h2 id="primum-alpha">Primum "Alpha"</h2>
<h2 id="primum-beta">Primum "Beta"</h2>
<h1 id="secundum">Secundum</h1>
<h2 id="secundum-alpha">Secundum "Alpha"</h2>
<h1 id="tertium">Tertium</h1>
<h2 id="tertium-alpha">Tertium "Alpha"</h2>
msprev commented 6 years ago

This is a good point, well made. I knew that the current behaviour wasn't ideal, but this shows that it needs some proper attention. I will introduce a proper fix along the lines of 3.ii (but perhaps not with exactly this syntax).

Let me think about the best way to implement this.

I'm closing pull request (https://github.com/msprev/panzer/pull/16), as it references the same issue.

bpj commented 5 years ago

How is this going?

msprev commented 5 years ago

This is a fairly big change and I haven't had time to do it. It's on the list though, so it will get done.