Open sdbbs opened 3 years ago
You can do -t gfm-yaml_metadata_block
and the metadata block will be omitted.
Or, if you want the metadata in an HTML comment, here's another trick you can already do: create a template ipynb.markdown
as follows:
<!--
$meta-json$
-->
$body$
Then
pandoc my.ipynb --template ipynb.markdown -t gfm-yaml_metadata_block
<!--
{"jupyter":{"jupytext":{"text_representation":{"format_version":"1.2","jupytext_version":"1.4.2","extension":".md","format_name":"markdown"}},"kernelspec":{"display_name":"Python 3","name":"python3","language":"python"}}}
-->
my doc
By the way, I kind of like the idea of putting metadata inside an HTML comment. I suggested exactly this in 2011 on the markdown-discuss mailing list.
In principle, we could create a new extension, yaml_metadata_in_html_comment
, that enables this (for both input and output). But I'm reluctant to add to the gratuitous proliferation of syntax extensions.
Hi @jgm,
Many thanks for the feedback - and sorry I could not respond earlier!
You can do
-t gfm-yaml_metadata_block
and the metadata block will be omitted.
Was not aware of that option - however, I think it only helps if it is pandoc
creating the HTML; what I want to do instead, is use a Markdown file otherwise intended for pandoc
, in an automatic online Markdown->HTML parser, such as Github's.
Here is an example: I have an .md
file, that is intended as a source for pandoc
, with the intended pandoc
output being PDF via Latex. However, I also keep this file in git
, and in my online repository, I use https://github.com/gitbucket/gitbucket as a web interface to my git repositories.
When I access GitBucket, and try to open this .md
file, I get something like this:
In other words - the Markdown-HTML parser of Gitbucket did not recognize the YAML header block, and started interpreting eveything inside it as Markdown. Specifically, I have a line in the header:
# lines starting with # are YAML-level comments!
... and indeed, pandoc
interprets this fine as a comment inside the YAML header - however, Gitbuckets Markdown parser intepreted it as plain Markdown, that is, it intepreted it as a heading.
So, if we could alternatively use say <!---
and --->
(note, three dashes!) as opening and closing of a YAML header block in a Markdown file in pandoc
, then:
pandoc
would intepret that section as a YAML header block, as intended<!--
as opening of HTML comment, ignore everything inside, and see the -->
as a closing of an HTML comment, and thus will not print any text in the YAML header block in that Markdown file (and will be thus easier to look at online, since there will be no metadata text from the YAML header, and corresponding weird formatting, to interfere).Or, if you want the metadata in an HTML comment, here's another trick you can already do: create a template ipynb.markdown as follows:
Thanks - that seems to be specific to Jupyter notebooks; I haven't really tried it, but it does not look to me, that it would help with my use case ( I want to keep a YAML header block in .md
file, while hiding it from other Markdown parsers).
In principle, we could create a new extension, yaml_metadata_in_html_comment, that enables this (for both input and output). But I'm reluctant to add to the gratuitous proliferation of syntax extensions.
I guess that a new extension would help my use case personally - however, I see your point with "gratuitous proliferation", and I agree with it... So, maybe my suggestion above is worth considering:
<!---
(three dashes) could be an alternative syntax for opening a YAML block when pandoc
interprets Markdown; --->
(three dashes) could be an alternative syntax for closing a YAML block when pandoc
interprets Markdown... and all this "built-in" pandoc
(i.e. without enabling an extension) -- and all other Markdown parsers would see a HTML comment here instead, and thus not process the text content of the YAML block.
With all due respect I think the onus should be on your other parser to support YAML meta data, not on Pandoc to hide it. If it doesn't need to do anything with it all they need to do is spot the standard YAML separators and discard the block. This is a very standard extension to Markdown and used by many many parsers. If you need to support something less featured then some kind of build step that exports the variant you need should be considered par for the course.
Hidden behind a non-default option flag I couldn't actually object to this being a "feature", but both the proliferation of options and the proliferation of format variants seems like a bad thing to me.
Playing my broken "Lua filter" record again: if all else fails, here's a filter to make pandoc work with the syntax proposed by @sdbbs:
-- file: yaml-in-html-comments.lua
local meta
function RawBlock (raw)
if raw.format == 'html' and raw.text:match '%<%!%-%-%-'then
local yaml = raw.text:gsub('^<!%-%-%-', '---'):gsub('%-%-%->$', '---')
meta = pandoc.read(yaml, 'markdown+yaml_metadata_block').meta
end
end
-- set as document's metadata; could also do a merge instead (if necessary).
function Meta (_) return meta end
Use with pandoc --lua-filter=yaml-in-html-comments.lua ...
.
I think the Lua filter solution should work well enough, so I'm closing this. Please reopen if the proposed solution proves to be insufficient.
I'd like to keep this open for further consideration.
Thanks all for the comments:
With all due respect I think the onus should be on your other parser to support YAML meta data, not on Pandoc to hide it.
Yes, I should have mentioned, that I didn't easily decide to post this, because it obviously would increase the work/support load on the pandoc
project - which as a happy user otherwise, I'd like to avoid.
This is a very standard extension to Markdown and used by many many parsers.
OK, I was not aware of this, thanks for mentioning it.
However, gitbucket
s parser at least does not support it (yet); and my thinking was: if other platforms advertise simply "Markdown", and I tried to ask them for this enhancement (i.e. add code in their parsers that would ignore YAML headers), they could always point to the original Markdown spec https://daringfireball.net/projects/markdown/ and say that there is no mention of ---
or YAML headers there.
Hidden behind a non-default option flag I couldn't actually object to this being a "feature", but both the proliferation of options and the proliferation of format variants seems like a bad thing to me.
Fully agree there.
But now that I have seen the lua filter in https://github.com/jgm/pandoc/issues/7183#issuecomment-821777277 - I actually think I could live with it, since I use lua filters in my workflow anyways; so I guess, that particular lua filter solves my problem.
These days the Common Mark project is a much better place to point projects toward if you want them to have interoperable Markdown than the original Daring Fireball post, but you do have a point — as widespread as YAML meta data is (used by many publishing platforms, static side generators, even Markdown note taking applications!) it is still an extension to Markdown not part of Markdown itself. Even CommonMark thinks of it that way. The Pandoc flavor includes it by default, but having a way to wrap the extra data in a way that any CommonMark compatible parser would not break would be an interesting extension.
Thanks, @alerque :
These days the Common Mark project is a much better place to point projects toward if you want them to have interoperable Markdown
Thanks, good to know this!
Btw, I just found something going against my suggestion of <!---
(triple dash) as alternative for opening tag for YAML:
https://stackoverflow.com/questions/4823468/comments-in-markdown
I use standard HTML tags, like
<!--- your comment goes here and here -->
Note the triple dash. The advantage is that it works with pandoc when generating TeX or HTML output. More information is available on the pandoc-discuss group.
Not sure if this is still applicable though, tried <!---
vs <!--
on multiline (as in, \n
line) text in my doc in pandoc 2.13, they both seemed to work fine. But in any case, there is a historical precedent of using <!---
for something else.
Triple dashes being treated differently was probably a bug. HTML comments are a nightmare to parse. Did you know --
is a field separator in comments? Yes comments have fields. And the get parsed for other things too. Some browsers overload them, some servers use them as preprocessing hints, and so on. They are minefields. In any case I don't think triple dashes are a good way to overload comments.
I would like to propose, as an enhancement, the same approach taken here Hide metadata header in markdown · Issue #527 · mwouts/jupytext :
In other words - allow that, instead of the default opening and closing "three-dashes" (
---
) strings, that define start and end of a YAML header block in Pandoc Markdown - the opening (<!--
) and closing (-->
) tags for HTML comments are used. In that way, the header would still be interpreted by Pandoc - while being fully hidden from typical automatic online parsers of Markdown to HTML (such as GitHub's).Alternatively, allow that the very first line in a Pandoc Markdown document can start with an HTML comment, and that the starting
---
of a Pandoc Markdown YAML header can be on the second line of the Markdown text file; in that way, probably most of the code that parses the YAML header can be kept (including starting and stopping---
), while still allowing for hiding the YAML header from online Markdown parsers.