tajmone / ST4-Asciidoctor

AsciiDoc Package for SublimeText 4
https://tajmone.github.io/ST4-Asciidoctor
MIT License
11 stars 6 forks source link

Custom Attributes and IDs Names #51

Open tajmone opened 6 months ago

tajmone commented 6 months ago

We need to improve how the syntax handles attributes and IDs, so that it can capture valid and invalid attributes and ID names accordingly.

The best approach is to define custom variables that can then be reused in the RegExs that handle matching in the various contexts.

In other words, the RegExs that handle them should be able to account for — and intercept — invalid attributes and IDs names and scope them as invalid, in order to warn the user about the problem.

Custom Attributes

Custom attributes have a fairly simple rule when it comes to naming:

An important point to note here is that attribute names are stored in lowercase, and it's best practice to use lowercase lettering only:

Although uppercase characters are permitted in an attribute name, the name is converted to lowercase before being stored. For example, URL-REPO and URL-Repo are treated as url-repo when a document is loaded or converted. A best practice is to only use lowercase letters in the name and avoid starting the name with a number.

I'm not sure whether we ought to signal with the deprecated scope attributes that contain uppercase letters — i.e. to inform the user of a potential naming conflict, in case he/she was relying on letter-casing difference as being independent attributes, or the mere violation of best practices. Need to think about it.

Custom IDs

Custom IDs, on the other hand, are subject to different rules depending on how they are defined.

There are three ways to define a custom ID:

In the first two cases, the same rules are somewhat simpler — still unclear to me if they are the same of attributes names or not\:

When the ID is defined using the shorthand hash syntax or the anchor syntax, the acceptable characters is more limited (for example, spaces are not permitted). Regardless, it’s not advisable to exploit the ability to use any characters the AsciiDoc syntax allows. The reason to be cautious is because the ID is passed through to the output, and not all output formats afford the same latitude. For example, XML is far more restrictive about which characters are permitted in an ID value.

As for the longhand id= syntax, things are more complicated since AsciiDoc doesn't impose restrictions, but strongly advises to stick to the XML naming convention\:

AsciiDoc does not restrict the set of characters that can be used for an ID when the ID is defined using the named id attribute. All the language requires in this case is that the value be non-empty.

As for the best practices:

To ensure portability of your IDs, it’s best to conform to a universal standard. The standard we recommend following is a Name value as defined by the XML specification. At a high level, the first character of a Name must be a letter, colon, or underscore and the optional following characters must be a letter, colon, underscore, hyphen, period, or digit. You should not use any space characters in an ID. Starting the ID with a digit is less likely to be problematic, but still best to avoid. It’s best to use lowercase letters whenever possible as this solves portability problem when using case-insensitive platforms.

When the AsciiDoc processor auto-generates IDs for section titles and discrete headings, it adheres to this standard.

Basically it's a huge mess to deal with, as far as our RegEx-driven syntax goes.

We need to keep into account that the syntax needs to be able to handle any valid construct, regardless of whether it follows best practices or not — we can't have the syntax break-down due to a well-formed construct.

Probably, the best approach here is to go for to broad case: i.e. accept anything that is valid. Ideally, in the future, we could reach a point where the syntax first handles XML-abiding names, and then treats any name which is valid (but failed the XML RegEx) as deprecated, just to call attention to the fact that it violates best practices (can't really use invalid if it's acceptable).

References