Open nichtich opened 6 years ago
Additional ideas, some taken from CSS Selectors:
Image{width<192}
#myid
#"my-id"
{id=~regex}
.class
."class"
{.class}
{class*=foo}
Not to be confused with attributes!
Header[level=1]
Attributes may be selected same syntax but properties override attributes if selecetd with [...]
. For instance the Header
# xxx {level=99}
Has attribute level
with value 99 but property level
with value 1.
:document
:block
:inline
:meta
!Header
From where I'm coming from, this probably doesn't belong in pandoc itself. But you could easily create a library (in your programming language of choice) that could then be used to write filters with the syntax you describe – or indeed a filter that does the extraction you mention.
Also, there are already various ways to do something like you describe:
-t json
to jq-t html
to any DOM processor (e.g. nokogiri
in Ruby)-t docbook
or HTML again) to an XPath implementation (like saxon)Thanks for feedback and suggestions! XPath could help if there was an official serialization of the abstract syntax tree in XML. Without native support in pandoc (more specific: pandoc-types), there is a risk of differing implementations. A selector language for pandoc document model should not depend on a specific programming language or technology. At least a simple selector syntax is needed for #81 anyway to specify parts of a document. Another use case is converting annotations between formats (e.g. comments in Word documents and in annotations from services like hypothes.is): see fragment selectors in Web annotation.
I wonder if it would be worth exploring adding some kind of select
or filter
function to the pandoc API (perhaps in Text.Pandoc.Walk)? It could take a function from elements to boolean as an argument, and all it would do is traverse the tree (preserving order) and remove every element where the function returns false.
This function could then be exposed in lua filters and perhaps ultimately in some kind of command line utility or option.
@tarleb I was thinking about how this could be done with lua filters.
Do we have anything corresponding to query
(from Text.Pandoc.Walk) in the lua API?
@tarleb I was thinking about how this could be done with lua filters. Do we have anything corresponding to query (from Text.Pandoc.Walk) in the lua API?
Not yet, no. Adding a generic query function shouldn't be too difficult though.
Selection of document subsets such as single chapters and paragraphs should be supported by introduction of a Pandoc document selectors such as XPath/XPointer, Fragment Identifiers, and CSS Selectors. This could simplify creation of filters and support transclusion (#81) and annotation of documents.
Examples
A simple example is extraction of links - in this case the selector is
Link
:Some ideas for selector syntax to select elements by their attributes, element type, and values:
Selectors might be combined by alternatives:
To select larger parts of a document, a range operator or function is needed, for instance
would select the second chapter of a document (everything from including the second header with level 1 up to before the next header with level 1 (inclusion of the second part of a range could be done by a different syntax e.g.
A ...+ B
).