pml-lang / pml-companion

Java source code of the 'PML Companion (PMLC)'
https://www.pml-lang.dev
GNU General Public License v2.0
22 stars 1 forks source link

PDML vs PML Docs Bug: Different ID Patterns #70

Closed tajmone closed 2 years ago

tajmone commented 2 years ago

BUG: ID pattern mentioned in PDML and PML differ!

PDML:  [a-zA-Z_][a-zA-Z1-9_\.-]*
PML:   [a-zA-Z][a-zA-Z0-9_-]*

@pml-lang, in #56 you mentioned:

Yes. To cover all cases, the plugin should use the regex shown in the PDML Specification: [a-zA-Z_][a-zA-Z1-9_\.-]*

which prompted me to check the PML docs on how ID patterns are defined, and I immediately noticed that in the PML Manual and PML Reference whenever the custom ID pattern is mentioned it's: [a-zA-Z][a-zA-Z0-9_-]* and not [a-zA-Z_][a-zA-Z1-9_\.-]* — i.e.:

  1. the dot \. mentioned in the PDML Specification is missing).
  2. The first character can be an underscore _ according to PDML (but not PML).

The same when invoking PMLC for tags info:

> pmlc ti -t admon

    Admonition
----------
Tag            : admon
[...]
Attributes:
1.
Id             : id
Title          : Node Identifier
Type           : string or null
Description    : A unique identifier for the node.
                [...]

                 Note for programmers: The regex of an identifier is: [a-zA-Z]
                 [a-zA-Z0-9_]*. Identifiers are case-sensitive.

So which one is correct? the one with the . and _ or the one without them?

pml-lang commented 2 years ago

ID pattern mentioned in PDML and PML differ!

Well spotted! Thanks.

So which one is correct?

OMG! They are both wrong. Sorry for that. The correct regex is [a-zA-Z_][a-zA-Z0-9_\.-]* It's now fixed everywhere:

It's also fixed in PMLC, and the fix be included in the next public version 2.3.0.

tajmone commented 2 years ago

Well spotted! Thanks.

As of lately, being buried in Sublime PML and all it's matching patterns, I immediately had a hunch that the PDML pattern you mentioned didn't quite ring correct, but of course I wanted to check it before opening an Issue.

OMG! They are both wrong. Sorry for that. The correct regex is [a-zA-Z_][a-zA-Z0-9_\.-]* It's now fixed everywhere:

How could this happen? Aren't such core definitions hard-coded somewhere like a constant which is then reused all over the place (in the source code and docs source alike)?

Adding the ID Pattern to JSON File

I was also wondering whether this pattern might belong to the info exposed via the generated pml_tags.json. Currently the JSON file is structured:

{
    "pml_tags":{
        "pml_version":"2.2.0",
        "pml_release_date":"2021-12-14",
        "tags":[
            {

which focuses mainly on the PML tags.

Wouldn't it be worth adding the ID pattern too, e.g.:

{
    "pml_tags":{
        "pml_version":"2.2.0",
        "pml_release_date":"2021-12-14",
        "custom_id_re":"[a-zA-Z_][a-zA-Z0-9_\.-]*", // '\' Needs escaping?
        "tags":[
            {

(or something along those lines) so that end users can extract the pattern either to use it in their syntaxes or documentation, or just to check the current pattern status? (after all, in the future the pattern might change).

Re-Organizing JSON File?

Currently the generated JSON file focuses on the PML tags and their attributes, although it does contain some basic PML version info.

Wouldn't it make more sense to rename the JSON file from pml_tags.json to pml_info.json and expand its contents to also include other info unrelated to tags? i.e. as opposed to having multiple JSON files that could be generated via PMLC options?

E.g.

{
    "pml_info":{
        "pml_meta":{
            "pml_version":"2.2.0",
            "pml_release_date":"2021-12-14",
            "pml_website":"https://www.pml-lang.dev",
            "pmlc_repository":"https://github.com/pml-lang/converter"
        },
        "nodes_meta":{
            "custom_id_re":"[a-zA-Z_][a-zA-Z0-9_\.-]*",
            "nodes_total":45, // <-- i.e. whatever the number of PML nodes is!
            // etc.
        },
        "tags":[
            {

The above entries are just imaginary examples to make the point (although the PML links would be good candidates for a meta section).

IMO, it's better to pack all the info in a single JSON file, rather than having PMLC produce multiple info files. Since the JSON file is usually intended for machine processing, having too much info shouldn't be a problem — as long as the info is well organized in self-contained sections which can be filtered out.

pml-lang commented 2 years ago

Aren't such core definitions hard-coded somewhere like a constant which is then reused all over the place (in the source code and docs source alike)?

The regex should of course be defined at one place only. The reason for having it (currently) defined multiple times is that there are different types of source code files that cannot easily share data: PPL files, Java files, PML files, and HTML files.

Wouldn't it be worth adding the ID pattern too

Yes, good idea.

Wouldn't it make more sense to rename the JSON file from pml_tags.json to pml_info.json and expand its contents to also include other info unrelated to tags?

Yes, excellent idea too.

IMO, it's better to pack all the info in a single JSON file, rather than having PMLC produce multiple info files.

I agree, because it's for machine consumption. And it eliminates the risk of accidentally mixing JSON files related to different PML versions.

I suggest to create a dedicated discussion to clearly define and agree on the improved structure. After everything is well defined and thought out, it will be easy to implement it.

tajmone commented 2 years ago

The regex should of course be defined at one place only. The reason for having it (currently) defined multiple times is that there are different types of source code files that cannot easily share data: PPL files, Java files, PML files, and HTML files.

That's why we have tools like mustache! You can generate those source files via mustache, so regardless of the source type you can use the same JSON/YAML data to fill in some critical parts. Wasn't the scope of the JSON file to help maintain syntax highlighters definitions and the like?

Here is the same, except that it's PL files, Java files, PML files, and HTML files — none of which matters to mustache. Of course, you'll add an extra step to your toolchain, but it's worth the price — I always use this approach for projects that require automating the documentation, because trying to handle these details manually is too error prone.

I suggest to create a dedicated discussion to clearly define and agree on the improved structure. After everything is well defined and thought out, it will be easy to implement it.

Good idea!

pml-lang commented 2 years ago

you'll add an extra step to your toolchain

That's why I didn't do it yet. My plan for later is to create a customized, one-click build tool that takes care of everything.

pml-lang commented 2 years ago

As said already, the regexes have been fixed everywhere.

My plan for later is to create a customized, one-click build tool that takes care of everything.

Done since version 3.0.0.