Closed matklad closed 1 year ago
I don't remember whether I had a good reason for doing it this way. Perhaps I thought we could allow these to split over multiple lines, as we currently allow for URLs in regular links:
[My link text](http://example.com?product_number=234234234234
234234234234)
This would call for start and end, plus str and softbreak inside. Any thoughts on that?
Anyway, we could still make url
a leaf in the AST, idependent of the structure of the matches, and I think that's a good idea in either case.
On second thought, there is a drawback I hadn't considered to collapsing things in the AST using get_string_content
(which we use, e.g., to extract a string destination
for a link from what may be several str
and softbreak
elements).
The drawback is that we use fine-grained source position information. The same is true of collapsing the contents of code blocks into a single string (which we also do now). Consider
> ```
> my code
> is here
> ```
If we just say that the code block starts at l.1c.3 and continues through l4.c5, we aren't recording the fact that not all of the characters between those two positions are part of the code block.
On balance it's probably better for the AST not to worry about this; one could extract it from the match objects if one wanted this kind of fine grained positional information.
One advantage of the current setup is that in the renderer we can just do
Renderer.url = Renderer.link
Renderer.email = Renderer.link
because the structures are identical.
Hm, but the structures would be identical either way? They'll both be nested or flat.
Thinking more about this, do we actually need to distinguish between url & email in the AST?
<http://example.com>
<aleksey.kladov@example.com>
I think can be represented as
{
"tag": "url",
"destination": "http://example.com",
},
{
"tag": "softbreak"
},
{
"tag": "url",
"destination": "mailto:aleksey.kladov@example.com",
}
mailto:
prefix in the destination seem sufficient to distinguish the two cases?
Regular link
nodes need a nested structure, because the link descriptions can contain formatting.
We could indeed just use link
for all three cases, but some people have asked to retain the distinction between e.g. <me@example.com>
and [me@example.com](mailto:me@example.com)
. I'm not sure.
Regular link nodes need a nested structure, because the link descriptions can contain formatting.
Ah, sorry, I misunderstood you. Yeah, link
I think should be different from autolink, but for autolinks, distinguishing between email and http url doesn't seem that useful.
The current AST has email
and url
as leaf nodes, so closing.
{
"tag": "doc",
"references": {},
"footnotes": {},
"children": [
{
"tag": "para",
"children": [
{
"tag": "email",
"text": "me@example.com"
}
]
},
{
"tag": "para",
"children": [
{
"tag": "url",
"text": "http://example.com"
}
]
}
]
}
for
<http://example.com>
we produce AST like this:and we emit original matches as:
https://github.com/jgm/djot/blob/2c0646f42e47c43c4ddaa28b0ad63a9d7da51107/djot/inline.lua#L232-L234
url>str nesting seem superflows, just a flat
seems like it should be sufficient?