Open tajmone opened 2 years ago
I agree 100% with everything mentioned in your post. These limitations will be removed in the next PML version (the version I'm currently working on, written in Java). Absolute and relative URLs and file paths will be supported for all media assets, and the paths and URLs will no more be validated by PMLC (unless explicitly asked for).
I agree 100% with everything mentioned in your post. These limitations will be removed in the next PML version (the version I'm currently working on, written in Java).
Wonderful! I could then finally start using PML in executable ebooks (which all use custom protocols like ebook:///
, etc.) and in-software documentation (via WebControl).
Absolute and relative URLs and file paths will be supported for all media assets, and the paths and URLs will no more be validated by PMLC (unless explicitly asked for).
The idea of allowing validation on demand is good.
How are you going to implement it, via CLI option or per element via node attributes?
That's actually a feature I've been pondering on in my free time, when I try to think of missing features in the various markup languages which could be added to PML. A draft idea I had in mind was the ability to enable gathering info on an image at conversion time, e.g. to calculate its width and height and inject them in the final HTML in order to ensure that its placeholder (while loading) is of the correct size.
E.g. it could be something like:
[image ( source=pear.png width=extract ) ]
where the special value extract
implies that PML should locate the image and extract it's dimensions info from its header. The generated HTML should be something like:
<img src="pear.png" width="300">
The idea is that the special extract
attribute value could be used in various places where meta-data extraction from external assets might be useful. In some media types metadata might play a greater role than others, e.g. containing info about the title, author(s), creation date, license, etc., and sometimes even preview images or cover-art; and the extract
feature would allow to re-use a same PML block as a template for each external assets, since only the file name would be required to obtain the value of all other fields, e.g.:
[youtube_video
yid = NUDhA4hXdS8
caption = extract(title)
]
i.e. assuming that title
is a valid metadata entry name that can be somehow obtained via YouTube API once you known the video's YID.
Surely, the ability to extract metadata and info for all the various possible assets would require implementing a dedicated function for each asset type, and knowing where and how the data is stored. But maybe if PMLC could expose a public API for the extract(x)
function for each node involving an external resource, then maybe end users could provide the data extraction code themselves, e.g. via script nodes or some external custom tool invoked via the command line or as a process.
This feature would be very useful when creating catalogues of assets, o handling dynamically generated html pages with lots of images (which might take time to load) which are computed at conversion time (e.g. based on the files present in a given folder), etc.
How are you going to implement it, via CLI option or per element via node attributes?
I was already thinking of a general way that would allow to define default values for all attributes of all nodes. Default values could be defined:
config
node, a direct child of the doc
node (overrides 1)Each default value can of course be explicitly overridden for each individual node in a document.
Note that script nodes can also be used to define default values, as demonstrated here.
the ability to enable gathering info on an image at conversion time, e.g. to calculate its width and height and inject them in the final HTML in order to ensure that its placeholder (while loading) is of the correct size. ...
caption = extract(title)
Very nice idea to optimize and automate!
The options precedence seems good, but you should also consider introducing a precedence modifier, like the @
symbol in Asciidoctor (see Altering the assignment precedence in Asciidoctor Manual).
This allows to change the default precedence, e.g. by prefixing an option with the @
either in the CLI, the node definition or the settings file. This becomes quite important when dealing with complex projects that share common settings file or invocation commands, and some documents might need the option (or attribute, etc.) defined in the document to have higher precedence over the CLI, and viceversa.
- in a not-yet-existing
config
node, a direct child of thedoc
node (overrides 1)
So, if I've understood correctly, all the document options will be part of the [doc
attributes and/or sub-nodes, along with meta-data, etc. Pandoc uses a YAML section for metadata, placed at the beginning of a document, but then in the actual AST the metadata and options are effectively sub-nodes of the metadata
node, so I guess that PML approach is more similar to the latter, since it's always closer to an AST in its syntax.
It makes sense. Then I imagine that ultimately the nodes tree will be something like:
-+- [doc +
+- [metadata
+- [options/settings
+ ...
If end users have access to all nodes, dynamically from within the document itself (e.g. to read metadata or settings, or even change them) it could lead to very interesting use cases. E.g. conditional text could be shown depending on a metadata or setting value, and so on (e.g. omit contents if it's a sample/demo version of an eBook).
consider introducing a precedence modifier
Good point.
I imagine that ultimately the nodes tree will be something like:
-+- [doc + +- [metadata +- [options/settings + ...
Yes, exactly.
If end users have access to all nodes ...
Yes, meta- and config-data should be available as maps (dictionaries) in script nodes.
Moreover, for each meta-data there should be an attribute (maybe named show
, with sensible default values) to define if the data will be displayed automatically in a meta-data-table at the beginning of the document. E.g.:
[meta
[author Albert Newton] [- shown by default -]
[license (show=yes) MIT]
[license_URL (show=no) https://opensource.org/licenses/MIT]
]
Moreover, for each meta-data there should be an attribute (maybe named
show
, with sensible default values) to define if the data will be displayed automatically in a meta-data-table at the beginning of the document.
That's an excellent idea. It allows to store extra info even if not actually displayed in the document. I don't like the name show
though, it's a bit vague. maybe more context-appropriate candidates could be: display
, hidden
, reveal
(although none of them really conveys the document-specific goal).
inability to use images stored on the web, or provide links that use relative paths, or protocols others than those accepted by PMLC
Fixed in version 3.0.0.
Moreover, a shared options file, valid for all documents, as well as an option
node (a direct child of the doc node) have been added in version 3.0.0.
@pml-lang, while working on the pandoc PML Writer I've started including test documents from pandoc's test suite. These are documents in various formats covering many formatting elements and their possible combinations.
Although the Lua filter is correct, many of these documents fail conversion via PMLC because they either contain images pointing to the Internet, or links with relative paths, etc., which PMLC doesn't support.
Example, a document containing a link pointing to
/url
results in the following PMLC error:I really can't understand why PMLC enforces these restrictions on links and images paths, acting like a sort of validator.
The inability to use images stored on the web, or provide links that use relative paths, or protocols others than those accepted by PMLC (from
gopher://
to MSres://
, up to custom URL Monikers) is a huge impediment in the adoption of PML for many domain specific applications:/url
link like the one from the above error message would be a normal link on a custom server.res://
protocol is essential for storing HTML documents (or images and other assets) into DLLs, and then accessing them from within an application (e.g. via the WebControl).When I think of the HTML format and it's innumerable applications (from HTML based eBook formats to software using WebComponent GUIs) I really struggle with these limitations, especially not being able to include images from the web.
Also, why should PMLC check if an image exists at all? In many documentation toolchains the images are generated on the fly, from ASCII blocks within the source document (or elsewhere), so they are deleted before each, and not available until after the conversion. Procedurally generated images are such a strong component of software documentation that I can hardly imagine working without them (think of railroad diagrams, etc.).
You should really consider either revising the way PMLC handles URL attributes, or at least provide some alternative attributes that allow handling relating paths within any URI scheme and/or custom protocols or monikers.
I fail to see the reasons for the current errors, since all of the above mentioned cases adopt the same convention for how a protocol/moniker is expressed in terms of
<name>://path/segmented/by/slashes/
.The fact that so many basic test docs from pandoc test suite are failing to build with PMLC — not due to failed pandoc to PML conversion, but because of unsupported paths/URLs in PMLC — is a strong indicator that there's something wrong with how PMLC approaches resources paths and links. All the markup syntaxes I've worked with don't attempt to validate paths and URLs (unless you specify options like embedding images as Data URIs), for they assume the author knows what he/she's doing — it could be just a document template, or the image is temporarily unreachable due to a server being done, or the HTML page is intended to be used inside a running executable app, from a DLL ... you name it, there could be hundreds of reasons why an external asset is not reachable at the specified path/URL.
I think these problems have to be solved natively, i.e. in the vanilla PML syntax, as opposed to via custom scripts or extensions.
What are the reasons behind the current way PMLC handles images paths and links? Why links are expected to be expressed via protocols, when an HTML page might simply want to link to another page in the same folder (possibly without having to resort to the
file://
protocol)?