Open marc-medley opened 7 years ago
Can confirm on pandoc 1.19.2.1
$ echo -e '```html\nfoo\n```' | pandoc
<div class="sourceCode">
<pre class="sourceCode html">
<code class="sourceCode html">foo</code>
</pre>
</div>
Yet:
$ echo -e '```html\nfoo\n```' | pandoc --no-highlight
<pre class="html"><code>foo</code></pre>
When you do
``` foo
bar
in pandoc, it's exactly equivalent to
bar
so `foo` is just a class. We don't know whether it's meant to be the name of a language syntax or something else entirely.
So adding the `language-` prefix to all the classes of a code block certainly wouldn't be the right thing to do. We could, I suppose, add the prefix to class names that correspond to known language names, i.e. to language names that pandoc's own highlighter is aware of.
We could, I suppose, add the prefix to class names that correspond to known language names, i.e. to language names that pandoc's own highlighter is aware of.
For my use case, this would be an OK approach.
Noting that, semantically, the html <code>
tag seems to be an appropriate place for a language-
class attribute.
We generally want the class on the pre, because highlighting styles often include a background color. It probably wouldn't hurt to put it on the code as well in spans, but it hasn't been necessary.
+++ Mauro Bieg [Aug 21 17 02:11 ]:
Can confirm on pandoc 1.19.2.1 $ echo -e '
html\nfoo\n
' | pandocfoo
Yet: $ echo -e '
html\nfoo\n
' | pandoc --no-highlightfoo
— You are receiving this because you are subscribed to this thread. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.
References
There are two distinct code highlighting use cases:
Use Case default
: Pandoc provides the complete code highlighting in html output.
Use Case --no-highlight
: Pandoc code highlighter is disabled. Pandoc produces an "intermediate" html. An external highlighter such as prism.js or highlight.js is later applied to the "intermediate" html when loaded into a viewing browser.
This particular issue is only intented to apply to the --no-highlight
use case.
So, yes, the default
use case should continue with what works for the Pacdoc highlighter. e.g. use <pre>
for background color.
Yet, when the --no-highlight
option is used then possible downstream highlighters should be considered.
For example, both highlight.js
and prism.js
can consume the following clean, simple, maintainable html and produce various colored backgrounds along with full syntax highlighting.
<pre><code class="language-css">p { color: red }</code></pre>
Please see highlight.js usage and demo (supports language-abc
, lang-abc
and abc
)
Please see prism.js basic usage and examples
In both the highlight.js
and prism.js
examples, the <pre>
tag does no have any additional attributes.
So, in the --no-highlight
use case, the language-
class placed in the <code>
tag is sufficiently and complete for downstream highlighters such as prism.js
and highlight.js
to also provide background coloring in the final html delivered to the viewing browser.
Couldn't this be handled with a filter which adds the language-
prefix to the first class, if any, of all CodeBlock elements, and overrides the builtin HTML rendering of code blocks? It would be very easy with Pandoc::Filter:
#!/usr/bin/env perl
use strict;
use warnings;
use Pandoc::Filter;
use Pandoc::Elements;
use HTML::Entities qw[ encode_entities ];
pandoc_filter 'CodeBlock' => sub {
my $attrs = stringify_attrs($_); # here $_ is a reference to element object
return unless length $attrs; # default rendering OK
my $content = encode($_->content); # the code
return RawBlock html => qq(<pre><code $attrs>$content</code></pre>);
};
sub stringify_attrs {
my($elem) = @_;
my $kv = $elem->keyvals; # get Hash::MultiValue object
my @attrs;
if ( my @classes = $kv->get_all('class') ) {
$kv->remove('class');
@classes = map {; encode($_) } @classes; # shouldn't be needed!
push @attrs, qq(class="language-@classes");
}
ATTR:
for my $attr ( sort keys %$kv ) {
my @values = $kv->get_all($attr);
next ATTR unless @values;
push @attrs, map{ $_ = encode($_); qq($attr="$_"); } sort @values;
}
return "@attrs"; # array items as space-separated string
}
sub encode { encode_entities $_[0], '<>&"' }
Note that this requires that the author has the discipline to make sure that it always is appropriate, or at least doesn't break anything, to prefix language-
to any first class of a codeblock.
Note also that I'm writing this on my tablet, so the code is untested but it should do the right thing.
This seems to still be an issue with the current version of pandoc. Even using the --no-highlight option I'm still seeing the class added to the <pre>
tag and no class added to the <code>
tag.
Here is a lua filter to do this. It passes through any classes that don't match a programming language name and all ids. Attributes are stripped, but I'm not sure too many people use them anyways.
Just use --lua-filter standard-code.lua
How is this issue going?
https://github.com/jgm/pandoc/issues/3858#issuecomment-324128577 <- this option looks good for me because I want to use highlight.js.
There are a couple of possibilities here:
Change the HTML writer so that, when --no-highlight
is used (i.e., writerHighlightStyle opts == Nothing
), pandoc produces a language-LANG
class on the code elements in both inline and block code.
A question is how the language is identified. (A code span or block may have a number of classes, only one of which is the language -- or it may be that none of the classes are languages.) One possibility would be to check a list of known languages. We could, perhaps, include the list that highlighting-js currently supports.
Anyway, on this approach you could write
``` C
int i = 0;
and it would be rendered
````html
<pre><code class="language-C">int i = 0;
</code></pre>
Another approach would involve a much more minimal modification. This would simply move any class beginning with language-
to the code tag instead of the pre tag, in rendering HTML. It would be insensitive to the setting of --no-highlight
. With this approach you'd write
``` language-C
int i = 0;
Another idea would be to always add language-
to a single word after the opening code backticks, so that
``` C
int i = [;
would be parsed as a code block with class `language-C` rather than `C`. The logic for highlighting could be modified so that we first check the classes for `language-X`, then for known languages (so a class `C` would also work). The main drawback of this approach is that it could break some current setups that are assuming that the class name will be `C`.
@jgm I would go with possibility 3.
, with 1.
as a second choice, based on the following notes…
Possibility 1. any class after opening code backticks
Performance & Maintenance Issue: Looks up each class against some ever evolving language name list, like PrismJs⇗ or highlight.js⇗ supported languages.
Possibility 2. use language-LANG in markdown
Breaks Markdown Editing Highlights Issue: Breaks source and preview highlighting in many markdown editing environments. Widely used markdown fenced code syntax uses just the language name: c
, java
, swift
, etc as the first word after the opening code fence.
Here is an example from editing markdown in Atom:
Note: Requiring language-
in markdown code fences breaks thousands of markdown files in my use case.
Possibility 3. use first word after opening code backticks
In my use case, the first word (if present) is the code language name.
Always add
language-
to a single word after the opening code backticks
Fenced C code in markdown input:
Renders HTML5 recommendation compliant output:
<pre><code class="language-c">int i = 0;
</code></pre>
Note: may need to recognize no-highlight
(in markdown) as a case for not adding any language highlight class when multiple classes are used after an opening code fence. (Just mentioned from completeness … for use cases which also have non language classes... although this is not my current use case.)
I guess it's too late to worry about adding yet another command line option. So the best approach is
--no-highlight
<code>
instead of <pre>
when --no-highlight
~ ,--language-prefix
that adds language-*
to the first class.~At the moment it's not fixable via pandoc filters: I need to iterate via beautiful soup to move class...~ Nope.
Both Highlight.js and Prism.js works with attributes set to <pre>
PS By the way: if there is something to worry about CLI options is that they are not in the alphabetical order in the --help
UPD: simple pandoc filter like this solves the issue.
I "fixed" this on my website using some hacky regex operators on the HTML produced by pandoc. However, it would be nice if pandoc added a flag to fix this.
var re = /\<pre class=".*?"><code>/;
while (result.search(re) != -1) // result is the html from pandoc
{
var preTag = result.match(/\<pre class=".*?"><code>/g)[0];
var finishIndex = preTag.split('"', 2).join('"').length;
lang = preTag.substring(12, finishIndex);
var newHTML = `<pre><code class="language-${lang}">`
var original = `<pre class="${lang}"><code>`;
result = result.split(original).join(newHTML);
}
Both Highlight.js and Prism.js works with attributes set to
<pre>
I am trying to have line-numbers in latest version of reveal.js. The included version of highlight.js supports line numbers but only in the <code>
tag.
So, I have two questions:
Is there a way to write a filter to move attributes from <pre>
to <code>
?
I saw the demo lua writer https://github.com/jgm/pandoc/blob/master/data/sample.lua, it puts the CodeBlock attributes to the <code>
tag instead of the <pre>
, see line 241: return "<pre><code" .. attributes(attr) .. ">" .. escape(s) ..
. Is a new writer the only way?
You could do it with a filter, by replacing each CodeBlock
element with a RawBlock (Format "html")
and building the HTML yourself. A bit tedious, and you'd need to be careful about escaping, but not too hard.
Thank you for the information and your quick response. I'll give it a try.
I believe this should do the trick: https://github.com/pandoc/lua-filters/tree/master/revealjs-codeblock
Thank you very much for the information. It works for me.
When converting Markdown to HTML using
--no-highlight
option withfenced_code_attributes
flag enabled, then <pre class="name"><code> tags are generated.This request is to update <pre class="name"><code> to generate W3C HTML5 recommendation example output syntax <pre><code class="language-name">.
For example, <pre class="markdown"><code> would become HTML5 <pre><code class="language-markdown">.
W3C HTML5 Recommendation:
code
elementPrism.js Basic Useage also illustrates use the same HTML5 recommendation example syntax