pml-lang / pml-companion

Java source code of the 'PML Companion (PMLC)'
https://www.pml-lang.dev
GNU General Public License v2.0
22 stars 1 forks source link

Stylesheets Copy-&-Subfolder Prevents CSS Debugging #94

Open tajmone opened 1 year ago

tajmone commented 1 year ago
PMLC 3.1.0 | Win 10

NOTE — Although this was already mentioned in a side comment of Issues #93, I'm creating a dedicated Issue for this matter so we can link it specifically and track its status.

Problem Description

As of PMLC 3.1.0, the new --CSS_files option always copies the specified stylesheets into a css/ subfolder created by PMLC, which hinders development of custom stylesheets when using CSS compilers like Sass, LESS, etc.

When compiling to CSS, Sass also generates the required .css.map files, and at the end of each output CSS adds a comment containing a link to its corresponding map file, e.g.:

/*# sourceMappingURL=pml-default.css.map */

These CSS map files allow developers to inspect and debug their stylesheets directly in their browser via its Developer Tools, which in most modern browsers include support for CSS maps and Sass/Less sources, which means that the end users are shown the real-use effect of custom CSS definitions directly pointing to their Sass sources and modules, not just the compiled CSS output.

But because the PMLC generated HTML documents are using a copy of the .css files (located in the css/ subfolder) instead of the original CSS files generated by Sass, the CSS map links in these comments won't be pointing to the correct .css.map files, thus breaking the advanced debugging features of browsers Dev Tools (e.g. Chrome).

Required Solutions

We need either:

  1. A new parameter for the PMLC p2h option, allowing end users to bypass the CSS-copying stage by having PMLC simply link to the original files specified in the --CSS_files parameter, as they are, without creating copies.
  2. Change the default PMLC p2h behavior when custom CSS files are specified via the --CSS_files parameter — i.e. achieving the same as point (1) but without requiring additional CLI parameters.

Solution (2) is probably a bad idea, since in most cases end users just want to employ the default stylesheets, or some custom stylesheets stored in PMLC data folders, where in both cases the idea is to create a copy of these stylesheet, which are just multi-use templates. But even in these cases, it might be useful to be able to control where the CSS file copies end up — i.e. being able to bypass the css/ subfoldering convention by either storing them in the same path as the HTML file, or in a custom folder with any arbitrary name.

But it's clearly important to be able to inform PMLC when a set of custom CSS files need to be copied and linked in the final HTML docs, and when they need to be linked only instead — ideally, PMLC should allow even a combination of both, via different parameters that can be used alongside.

Current Workarounds

Currently there are three possible workarounds to this, all of which require post-PMLC interventions:

  1. Tweak the generated HTML files, substituting all CSS paths within their headers, e.g.:

    <link rel="stylesheet" href="css/pml-default.css">
    <link rel="stylesheet" href="css/pml-print-default.css" media="print">

    by stripping the css/ path segment:

    <link rel="stylesheet" href="pml-default.css">
    <link rel="stylesheet" href="pml-print-default.css" media="print">
  2. Tweak the CSS copies within the css/ subfolder, substituting all CSS map paths within their end comment, e.g.:

    /*# sourceMappingURL=pml-default.css.map */

    by adding a ../ to the CSS path in order to seek for it in the parent-directory, where the Sass sources are usually located too, along with the original CSS files:

    /*# sourceMappingURL=../pml-default.css.map */
  3. Copy into the css/ folder all the Sass sources too (including their modules), so that the CSS map paths can be matched to their sources again.

Solution (1) is usually better, since it then allows to view changes to the Sass-generated CSS files by simply refreshing the HTML test docs. The only downside is when there are more HTML test documents involved than CSS files.

Solutions (2) and (3) both result in duplicate files, especially the latter which creates redundant copies not just of the CSS files but also all their Sass sources. Solution (3) also requires rebuilding the HTML test documents via PMLC and then copying again all Sass/CSS file when there are significant changes within the Sass sources and/or their modular structure.

As you can imagine, all of these solution are nothing more than hack-&-slash workarounds to the current limitation imposed by the PMLC native subfoldering conventions. Needless to say, in automated projects any of the above solutions is far from ideal and requires considerate extra scripting work to implement as a post PMLC conversion fix.

pml-lang commented 1 year ago

We need either:

  1. A new parameter for the PMLC p2h option, allowing end users to bypass the CSS-copying stage by having PMLC simply link to the original files specified in the --CSS_files parameter, as they are, without creating copies.
  2. Change the default PMLC p2h behavior when custom CSS files are specified via the --CSS_files parameter — i.e. achieving the same as point (1) but without requiring additional CLI parameters.

Solution (2) is probably a bad idea

I agree that solution 2 is a bad idea.

I suggest to add CLI option CSS_output_dir. Its value specifies a relative directory within the target output directory where the target HTML files are located. This gives the user full control over the target directory for CSS files.

The default value for this option is css, which means that the current (hard-coded) behavior is not changed (no breaking change).

If this option is explicitly set to null (i.e. --CSS_output_dir ""), then PMLC will not copy CSS files, but instead link to the original files specified by the CSS_files option. Alternatively we could define a special value to explicitly state that links to the original CSS files should be used (e.g. --CSS_output_dir !no_copy!). Using a special value is better IMO, because it makes the intent more explicit.

If this option is set to ./ then the CSS files will be copied in the same directory as the HTML files (no sub-directory used for CSS files).

Another suggestion:

Instead of only copying .css files by default (as will be done in the next PMLC version), we could also (by default) copy .css.map files (to avoid broken links in the CSS files). This would change nothing for normal end-users who use only .css files, but could possibly help CSS developers to debug their stylesheets.

tajmone commented 1 year ago

I suggest to add CLI option CSS_output_dir. Its value specifies a relative directory within the target output directory where the target HTML files are located. [...] The default value for this option is css,

Sounds good.

If this option is explicitly set to null (i.e. --CSS_output_dir ""), then PMLC will not copy CSS files, but instead link to the original files specified by the CSS_files option.

The problem with this solution is that you then can only enforce linking-over-copy when CSS assets are within the same folder as the HTML file, which is not so flexible.

Alternatively we could define a special value to explicitly state that links to the original CSS files should be used (e.g. --CSS_output_dir !no_copy!). Using a special value is better IMO, because it makes the intent more explicit.

If this option is set to ./ then the CSS files will be copied in the same directory as the HTML files (no sub-directory used for CSS files).

It's not clear whether this parameter can be passed multiple times on a single invocation or just once; if the former was the case, then it would be possible to specify both a custom CSS folder and the linking-over-copy option.

Ideally, end users should be able to have total control over individual CSS files, both in terms of which ones get just linked and which ones are also copied to target path(s) — indeed, in complex projects there might be multiple folders for CSS files, e.g. to separate shared stylesheets from ad hoc stylesheets, especially in multi-page documents organized into separate subfolders according to chapter (or part, or whichever other partitioning device).

Since in a field as vast as digital publishing is hard to pinpoint any single "works for all" solution that generalized all the possible uses and needs, IMO it's better to keep options as flexible and open as possible (along with simple and reasonable defaults).

Probably the solution here should consist in the ability to specify multiple CSS files sets, and for each set whether it should also be copied to the target path or just linked. If the current --CSS_output_dir could be somehow tweaked to include the "no copy" option, an assuming it could be passed multiple times in a same invocation, then we'd have full control. The problem is how to combine the two into one, i.e. the files set and their copy/no-copy setting. Most likely the solution lies in using an invalid path-character as a separator, but different from the files separator used by this parameter, so that it can't be mistaken for a filename.

It's not an easy choice, since the parameter value could be a directory, a file, multiple files (multiple directories too?), and we need to keep into account that OSs often differ when it comes to filesystem convention. Currently this parameter uses the comma character , as a file-separator (even though it's not a forbidden character in filenames). And of course, there's the issue of paths containing spaces, and the need to be able to enclose them within quotes.

The point is that we should avoid at all costs any platform specific traps here, like OS dependent escaping conventions and rules (Win uses ^, *nix uses \ to escape, etc.). Usually this type of CLI tools fall back on Unix conventions when it comes to filesystem and paths (the / separator instead of \, glob patterns support, etc.) and then the app internally handles all the required OS-specific adjustments, so end users have a (*somewhat*) clear reference, and scripts can be kept cross-platform. But if I recall correctly from previous conversations, with native Java things are not so easy when it comes to file paths.

So, assuming multiple --CSS_output_dir params can be passed, each paths set should have a single "copy/don't copy" option that goes with it, applying to the whole set. End users can always use the parameter again with a different copy option if they need to handle differently specific files — but then, when a directory is passed, how are conflict resolution applied? if one param says to treat my_css/apples.css as copy, but a later params telss to treat the entire my_css/ folder as no-copy, how will my_css/apples.css be handled?

All these ramifications needs to be kept into consideration (not only for this specific parameter, but in general), especially since ultimately end users will be give the choice to pass options via CLI parameters, setting files, as well as within the PML source [options node, so there will always be the need for clear rules when it comes to applying path settings in a cascading manner — even more so if in the future Git-style wildmatch pattern matching will be supported (which is most useful in big projects).

What I'm saying is that since we're looking into how to handle both simple and advanced uses for the --CSS_output_dir parameter, we might just as well take a step back and look at the big picture of the PMLC CLI, and how it handles its options in general. Providing a unified, consistent and intuitive UI is highly desirable in any CLI tool, and in our case we also need to keep into account how these same option can be handled via settings files and PML [option nodes.

E.g. above you proposed that the ./ value should be interpreted as the "same folder as the HTML file", which is intuitive, and I was fairly surprised when I discovered that passing p2h --output ./ results in an error, instead of being understood simply as a mean to bypass the output/ folder convention, and that you actually must specify the full output file name (p2h --output ./myfile.html myfile.pml) for it to work. That's an example of how parameters convention could be made more intuitive and consistent across the interface, and I'm fairly sure that the question of how to specify a set of paths (directories and/or files) along with one or more boolean options applying to them is not going to be exclusive to the --CSS_output_dir parameter but will also concern other params too.

The !no_copy! proposal sounds reasonable, as long as it can be part of the specified paths set, e.g. passed in the front or tail of the paths list. How would this work in practice? would it just be a comma-separated entry, either at the beginning or end? e.g.

--CSS_output_dir my_css/apples.css,my_css/pears.css,"shared css/",!no_copy!

Another suggestion:

Instead of only copying .css files by default (as will be done in the next PMLC version), we could also (by default) copy .css.map files (to avoid broken links in the CSS files). This would change nothing for normal end-users who use only .css files, but could possibly help CSS developers to debug their stylesheets.

I don't see the benefits of this, since the .css.map files also expect all the Sass sources to be present too in order to parse them and link them in the Dev Tools. Since a Sass source project could contains subfoldered libraries (consisting of hundred of files, of which maybe just a few are actually used) it could easily result in hundreds of files having to be copied. Not to mention that Sass is just one of the many CSS preprocessing languages, along with LESS, Stylus and other such syntaxes supported by modern browsers Dev Tools (each syntax with its own set of file extensions).

I just don't see it as a viable solution.

Currently I've solved the problem in the PML Playground by adding to its Rakefile a dedicated function that sanitizes the HTML files generated by PMLC by removing the css/ prefix from CSS paths. Although it's somewhat a hack & slash solution, it works as expected, at the penalty of some extra post-processing time in the toolchain. This morning I should be able to finish polishing the Rakefile comments, update the README docs and then push the Rakefile fix to the repository, so I'll be able to link to the solution within this thread too.

Fortunately, with Ruby and Rake it's possible to come up with such solutions fairly easily, and in a cross-platform manner too. But my guess is that with tools like GNU Make or Gradle such a workaround would have been a nightmare of external dependencies and cross-platform yak shavings — which is why I deem it worth investing energy in empowering PMLC as much as possible, so it can be as self-sufficient as possible, even in complex projects.

tajmone commented 1 year ago

Rake Workaround in Place

I've just pushed to the PML Playground repo the temporary fix for this problem via a new Rakefile function that post-processes the output HTML files generated by PMLC and strips the css/ prefix from CSS paths:

https://github.com/tajmone/pml-playground/blob/dacd5d0/Rakefile#L96

So, anyone in need of a quick and dirty solution to the current limitation can reuse the Ruby/Rake function from the above link, or adapt it according to need.

Probably not the most elegant solution (and surely I don't claim any Ruby idiomacy, let alone eloquence in the lang), but at least it works — and I've tried to keep the time-penalty down to the minimum by avoiding loading the entire file in memory, but working a line-at-the-time with a temporary out file along with the original, which should circumvent big heap allocations and result in faster execution times.

In any case, the added time penalty on the Rake build is barely noticeable so far (but there aren't many stylesheets to build the test docs against either).

pml-lang commented 1 year ago

Ideally, end users should be able to have total control over individual CSS files, both in terms of which ones get just linked and which ones are also copied to target path(s)

To achieve this I suggest to also add CLI parameter local_CSS_files.

Taking into account all other useful remarks, the final documentation for parameters CSS_files, CSS_output_dir, and local_CSS_files would look like this:

Parameter CSS_files

This optional parameter is used to explicitly specify one or more CSS files that will be copied into the output directory and used in the final HTML document.

The value is a list of directories and/or files. If a directory is specified, then all files with extension 'css' in the directory are used. Other files are ignored. Files in sub-directories are included. Each directory and file path can be absolute or relative. Each file must be a valid CSS file.

All files specified by this parameter are copied into the output directory. The location within the output directory is specified by parameter 'CSS_output_dir' (default is 'css'). Relative paths to sub-directories are kept in the target directory. Example: If this parameter is set to '../shared/css_files', and there is a file with path '../shared/css_files/common/basics.css', and parameter 'CSS_output_dir' is not specified explicitly, then the output directory will contain file 'css/common/basics.css'. If two or more files happen to have the same relative path in the output directory, then a warning is issued and the first file(s) are overwritten by the last one in the output directory.

By default, file(s) in sub-directory 'config/PML_to_HTML/css' of PMLC's shared data directory are used. If these default files are to be used together with other CSS files specified by this parameter, then the location of the default files must also explicitly be specified in the parameter.

The following characters can be used to separate directory/file paths: comma (,), semicolon (;), and colon (:). Therefore separator characters must not be part of directory/file paths. Leading and trailing whitespace surrounding the separator is ignored (e.g. "file1.css,file2.css" is equivalent to "file1.css , file2.css")

Instead of defining a list of elements, this parameter can also be used several times to define several sources for CSS files. Example: Instead of writing '--CSS_files "dir1, dir2"' you can also write '--CSS_files dir1 --CSS_files dir2'. The second syntax solves the (very rare) edge cases of leading/trailing whitespace in directory/file paths.

Parameter CSS_output_dir

This optional parameter allows to explicitly specify the target directory in which CSS files specified by parameter 'CSS_files' are stored.

The value must be a relative directory (relative to the output directory in which the target HTML files are stored).

The default value is 'css'.

Setting this parameter to './' means that CSS files are stored in the output directory (i.e. no dedicated sub-directory is used for CSS files).

Parameter local_CSS_files

This optional parameter is used to specify a set of CSS files that will be used in the final HTML document. However, in contrast to parameter CSS_files, the CSS files will not be copied into the output directory. Instead, the HTML document uses relative paths to the local CSS files specified by this parameter.

The value is a list of directories and/or files. If a directory is specified, then all files with extension 'css' in the directory are used. Other files are ignored. Files in sub-directories are included. Each directory and file path can be absolute or relative. Each file must be a valid CSS file.

The default value for this parameter is 'null', which means that no local CSS files are used.

The following characters can be used to separate directory/file paths: comma (,), semicolon (;), and colon (:). Therefore separator characters must not be part of directory/file paths. Leading and trailing whitespace surrounding the separator is ignored (e.g. "file1.css,file2.css" is equivalent to "file1.css , file2.css")

Instead of defining a list of elements, this parameter can also be used several times to define several sources for CSS files. Example: Instead of writing '--local_CSS_files "dir1, dir2"' you can also write '--CSS_files dir1 --CSS_files dir2'. The second syntax solves the (very rare) edge cases of leading/trailing whitespace in directory/file paths.

Note 1:

The case of specifying different target directories for the elements of CSS_files is not covered with the above approach.

IMO this is not easy to achieve with CLI parameters, because they are limited to key/value pairs where each value can only be a string. Generally speaking, the string value of a parameter can of course be parsed, and in our case we could do something like this:

Another solution would be to allow a list of directories for parameter CSS_output_dir, requiring that the number of elements in CSS_output_dir must be 1 (i.e. the same value used for all), or equal to the number of elements specified in parameter CSS_files (i.e. a one-to-one relationship of elements in CSS_output_dir and CSS_files).

However I think that right now we should not go that far (and confuse normal users), because this option would probably only be required in very advanced/complex environments. CLI parameters have their limits, and we should not try (and will never succeed) to cover all cases. Exceptional requirements like this (in professional environments with technical people) are better fulfilled with OS scripts, IMO.

The challenge is of course to find a good, reasonable balance. Keep it easy for normal users, but also powerful for advanced users.

Note 2:

The problem (edge case) of a separator character (,;:) included in a directory/file path could be eliminated by specifying that parameters CSS_files and local_CSS_files can only contain one directory or file path, and that the parameter must be used several times to define a list of paths. But I'm not sure if this is a good idea, because it's a very rare edge case, and users might expect and want separators to be supported.

Other

I was fairly surprised when I discovered that passing p2h --output ./ results in an error

According to the documentation, parameter output is a file path (not a directory path). Therefore ./ is invalid. We could change this and allow a file or directory path. In case of a directory path, the output file name (without the .html extension) would be the same as the input file name (without the .pml extension)

tajmone commented 1 year ago

I like your proposal and I think that having a separate parameter to handle CSS lists that need to be linked-to only is simpler than having a boolean parameter chained to the list.

Below are some thoughts on things which I'm not entirely convinced of.

Valid CSS

Each file must be a valid CSS file.

You mentioned this twice. But how is PMLC going to validate a CSS file?

And why should it? I might add.

It's the responsibility of the user to ensure that the CSS files which he picks are valid, not the job of PMLC.

Also, during my trial and errors, when I realized that PMLC was copying and linking every file from the specified CSS folder, including non-CSS files, this didn't affect the final HTML document — the files which weren't CSS were just ignored, silently; only the browser Dev Tools would report errors for them, but they didn't impact negatively the document, they were just ignored.

Parameter local_CSS_files

The parameter name doesn't fully deliver its intent, IMO. The term "local" is a bit vague and in this context could mean multiple things. After all, even the copied CSS files will be still local files.

The only alternative I can think of is linked_CSS_files, but that's not technically correct I guess. But at least it does convey the idea that these CSS files are being just linked, as opposed to being copied and linked.

I'm not sure which the correct term would be to indicate incorporation of an external (but local) CSS file. In the documentation of the <a> tag, the term used to indicate the role of href= is indeed linking; but I'm not sure whether most people would just think of a link as "clickable redirection link", rather than an "asset inclusion link".

But definitely "local" tends to add confusion, especially since this group of parameters already mentions a lot of relative paths in relation to source- vs target-document, and their subfolders (all of which are local, except the default PML data directories).

List Separators

The following characters can be used to separate directory/file paths: comma (,), semicolon (;), and colon (:).

I tend to prefer not having multiple equivalent token in general, but rather have one way to do something. In the future we might end up needing an extra separator, e.g. to delimit sub-lists, and in that case we'd regret having consumed all the obvious one previously, and might end up having to introduce a backward breaking change to reclaim one of them.

The comma is the most obvious separator, and the one usually employed in these cases. But the colon has the advantage of being a forbidden character in paths, so there's no risk of mis-interpreting it in edge cases.

Can you confirm to me how quotes can be used to handle paths and files with spaces, i.e. does the entire parameter value have to be enquoted, or can individual values be enquoted according to need? E.g.

--CSS_files "path w/spaces/",unspaced-path/,"path,with,commas/"

Note 2:

The problem (edge case) of a separator character (,;:) included in a directory/file path could be eliminated by specifying that parameters CSS_files and local_CSS_files can only contain one directory or file path, and that the parameter must be used several times to define a list of paths. But I'm not sure if this is a good idea, because it's a very rare edge case, and users might expect and want separators to be supported.

I agree that it's not worth to have a single dir or file per parameter call.

But don't underestimate the occurrence of punctuation in filenames when it comes to publishing, because they are more common than you imagine, especially when dealing with eBooks, which often are named according to their full title, commas and all. Most ePubs are named this way, for example.

Although a developer would do its best to avoid spaces and punctuation other than hyphens and underscores, writers might not reason so, especially when dealing with output files (e.g. they might want their final documents and their containing folders to reflect their contents titles, verbatim).

Other

We could change this and allow a file or directory path. In case of a directory path, the output file name (without the .html extension) would be the same as the input file name (without the .pml extension)

Even though the documentation does mention the parameter taking a file value, I thought that expecting ./ to be treated as above was fairly reasonable.

With many conversion tools, when it comes to output options it's implicit that when the user passes just a directory path it means it wants to preserve the original basename, but specify a non-default output dir.

pml-lang commented 1 year ago

I like your proposal

Thanks.

But how is PMLC going to validate a CSS file?

PMLC does not validate CSS files.

It's the responsibility of the user to ensure that the CSS files which he picks are valid, not the job of PMLC.

Yes, that's what I wanted to express with "Each file must be a valid CSS file."

Ok, to eliminate the confusion I'll replace it with: "Each file must be a valid CSS file, but PMLC does not check this."

The term "local" is a bit vague ... The only alternative I can think of is linked_CSS_files

I started with linked_CSS_files, but then changed it, because all CSS files are linked in the HTML document, including the files defined by parameter CSS_files.

Maybe a better name would be unbundled_CSS_files. It's a bit long, but the short version would be ucss.

I tend to prefer not having multiple equivalent token in general, but rather have one way to do something.

I totally agree with this basic principle.

The reason I added : and ; is that these characters are used in Linux and Windows as path separators. For example the separator for the PATH environment variable is : in Linux, and ; in Windows. These are also the system-dependent path separators in Java. Some users might expect these characters to be supported.

But the colon has the advantage of being a forbidden character in paths.

Yes, but only in Linux. In Windows you can have C:\temp, for example. Which also means that : as separator is indeed problematic and non-portable.

Ok, let's support only the comma as separator.

Can you confirm to me how quotes can be used to handle paths and files with spaces, i.e. does the entire parameter value have to be enquoted, or can individual values be enquoted according to need?

This is OS-dependent and also depends on where the parameter is defined.

In the CLI, a single string value must be assigned to each parameter. If the value contains spaces or other special characters like quotes, then the OS-specific rules must be applied by the user. For example, a parameter containing spaces must be enclosed with quotes on Windows (e.g. --name "value value"). A web search like 'windows cli escape character' quickly reveals that these escape rules can be quite confusing and complex. Moreover they are different on Windows and Linux, and they are context-dependent.

(Note: Just now I've discovered that I forgot to quote parameter values in the "Note 1" section of my previous comment. It's now fixed.)

If a parameter is defined in a PDML document (e.g. in a PML options node) then the escape rules are simple, straightforward, and portable. However, the rule "separator characters must not be part of directory/file paths." still applies, because we use a single string to define a list. This means that the current solution does not work for paths that contain commas. See also "Note 1" and "Note 2" in my previous comment for possible solutions.

With many conversion tools, when it comes to output options it's implicit that when the user passes just a directory path it means it wants to preserve the original basename, but specify a non-default output dir.

Ok. Then let's do what I suggested in my previous comment: "... allow a file or directory path. In case of a directory path, the output file name (without the .html extension) would be the same as the input file name (without the .pml extension)".

If there are no other issues, I'll post an updated description for the three parameters.

tajmone commented 1 year ago

CSS Validation

PMLC does not validate CSS files [...] that's what I wanted to express with "Each file must be a valid CSS file." Ok, to eliminate the confusion I'll replace it with: "Each file must be a valid CSS file, but PMLC does not check this."

I think the best way to avoid confusion is to not say anything beyond "a CSS file" — why on earth would anyone wish to link an invalid CSS file? It should be a fair assumption that PML end users have a propensity to honor standards, and that they expect them to function.

The problem with adding unneeded qualifiers is that the reader is then left to wander whether in their absence some broader interpretation is due, or if they are missing out some other key categories.

E.g. in the PML docs I've come across a few places where it mentions "a formal PML node", which left me with the hanging question "which are the informal nodes?" — and honestly, that question was burning so much in the back of my mind that it distracted me from the rest of my reading, because I kept wondering whether I had missed out some basic definitions. Formal compared to what?

In writing in general, but in tech docs in particular, similar qualifier call of a clarification, be it a footnote explaining the difference or a link to a glossary.

local_CSS_files Param

I started with linked_CSS_files, but then changed it, because all CSS files are linked in the HTML document, including the files defined by parameter CSS_files.

That's a dangerous assumption if we consider that in the future PMLC might introduce options to embed all assets in the final document, which is a commonly demanded feature for creating fully standalone documents. In that case, images would be embedded via Data URI, and CSS inlined within the HTML. And of course, this is a further argument against using the linked_ prefix.

Maybe a better name would be unbundled_CSS_files. It's a bit long, but the short version would be ucss.

I'm not sure whether this clarifies things or not.

The idea we're trying to convey with these two different parameters is that one (CSS_files) creates a copy and links the files, whereas the other only links it. Bearing in mind that in the future there might also be an option to embed CSS within the output document, we must keep into account how this might add confusion, and how it would override these other two parameters — should it apply to both, or only to CSS_files, since the local_CSS_files parameter seems to obviously indicate that the user want's to use the external (local) file as is, without copying it?

Often times when something seems untangled a good approach is to turn it upside down and see how it looks from another perspective. So far, the assumption has been that the default behavior should be to copy the CSS files to a target folder (which is PMLC default behavior lacking any params dealing with CSS, which is fine). Let's try and turn this assumption upside down, and imagine that by default any user-specified file would just be linked to the document, as is. Here's how these CSS options might look like:

Now, each parameter's role is clear and they don't overlap. And this doesn't necessary interfere with PMLC default behavior of adding the default stylesheets into the css/ subfolder, since the default behavior can be overridden when custom CSS parameters are involved — i.e. the assumption being that the default stylesheets are no longer needed.

So my impression is that if in the presence custom CSS parameters we assume that PMLC default behavior no longer applies, things start to fall in place. The default behavior is for the common user, so it implies sensible choices and assumptions, whereas parameters switch PMLC to "advanced mode", relinquishing assumptions and handing over control to the user. The target stylesheets folder would still be implicitly css/ though, unless the user overrides it, but in terms of how stylesheets are handled PMLC's behavior becomes unassuming.

The question remains regarding a future there option to produce a fully standalone document, and whether this should imply inlining all CSS files, or exclude those of the link_CSS_files param, or just apply to images. But that's a question for which there is no clear answer, since some might think that all local assets (CSS and JS files included) should also be embedded, while others assume it only applies to images. So probably it would make sense that such an option should target only images by default — I mean, it wouldn't make sense to embed via Data URI's base-64 a movie-clip, which would result in a huge ASCII representation. But then, again, since the proposed embed_CSS_files parameter is already explicit, it would be intuitive for users to just resort to it when their goal is to embed CSS files, so we won't need a new parameter to handle the exceptions.

Path Separators

But the colon has the advantage of being a forbidden character in paths.

Yes, but only in Linux. In Windows you can have C:\temp, for example.

True, I meant for relative files though, since the : is used only for drives, protocols, etc. at the beginning, but not withing the path in terms of folders and files.

Which also means that : as separator is indeed problematic and non-portable.

I didn't think of this. Indeed, we'd be facing problem when working with absolute paths.

The reason I added : and ; is that these characters are used in Linux and Windows as path separators. For example the separator for the PATH environment variable is : in Linux, and ; in Windows. These are also the system-dependent path separators in Java. Some users might expect these characters to be supported.

Is that a pro or a con in our case?

Ok, let's support only the comma as separator.

Seems like the comma is an intuitive choice at this point. It makes sense in that linguistically is what we use to separate items in list when we write, and although it's a valid file/folder-name character at least it's consistently so across OSs.

But then, you have more knowledge than me regarding cross-platform issues relating to file naming conventions.

Just bear in mind that in the publishing world it is indeed common to find file names containing all usable punctuation characters — some even contain curly quotes!

As for the OS-dependent problems with handling quoted strings, spaces, escaping, etc., I guess there's no solution. But as long as the CLI interface is usable across all OSs, end users will simply have to apply the usual workarounds for their OS — of course, when using Rake to automate projects none of these problems apply, since it works as expected on all OSs that support Ruby (just saying! :wink:).

pml-lang commented 1 year ago

I think the best way to avoid confusion is to not say anything beyond "a CSS file"

Yes, ok.

I've come across a few places where it mentions "a formal PML node", which left me with the hanging question "which are the informal nodes?"

In the world of programming, the term 'formal argument' or 'formal parameter' is commonly used to refer to its definition in the function. It consists of a name, a type, maybe a default value, description, etc.

Wikipedia puts it like this: "The term parameter (sometimes called formal parameter) is often used to refer to the variable as found in the function definition, while argument (sometimes called actual parameter) refers to the actual input supplied at function call."

See also: this and this.

In a similar way, I use the term 'formal PML node' to refer to its definition (name, supported attributes, etc.). Formal PML nodes are defined in the PMLC source code and documented in the PML Nodes Reference Manual. You can think of a formal node as a definition or specification for a node. Formal nodes define the actual nodes you can use in a PML document.

  Maybe a better name would be unbundled_CSS_files. It's a bit long, but the short version would be ucss. I'm not sure whether this clarifies things or not. deploy_CSS_files ... link_CSS_files ... embed_CSS_files ...

I'm not sure to correctly understand the meaning of these three options (are they lists of paths or boolean values?), but if they are paths:

I prefer to not rename CSS_files to deploy_CSS_files, because IMO (1) the typical (non-technical) user/writer probably doesn't know the meaning of 'deploy' in this context, (2) this will be the most frequently one used among these CSS options and it should therefore have a simple name, and (3) renaming would be a breaking change.

To avoid the ambiguity with 'link', we could use undeployed_CSS_files (short ucss) instead of unbundled_CSS_files.

And for the third type of directory/file list, I suggest embedded_CSS_files instead of embed_CSS_files (short ecss).

Just bear in mind that in the publishing world it is indeed common to find file names containing all usable punctuation characters

If users later ask to support the comma in paths, we could add the rule that two consecutive commas in a path is replaced with one comma (similar to how Excel allows to escape double-quotes). Adding this rule would be a non-breaking change, unless somebody used two consecutive commas in a path.

tajmone commented 1 year ago

Formal Nodes

In the world of programming, the term 'formal argument' or 'formal parameter' is commonly used to refer [...]

Indeed, these terms are the subject of much confusion even in the "formal" programming jargon, otherwise the WikiPedia page you cited wouldn't use qualifiers like "sometimes called". Matter of fact, many people prefer to use the terms "parameter" and "argument" to distinguish between the two, rather than resorting to terms like "formal" — but then, this is a highly opinionated topic, like tabs vs spaces, which is not worth pursuing.

I think the problem here is that PML is not aimed at software engineers but writers, so any assumptions regarding such specialized jargon are misplaced. I'm also not convinced that the comparison between nodes and parameters stands its ground, a node is more of a reserved keyword/token in PML, and unlike parameters it's not subject to different uses in the documentation and real practice (even attributes, being key-value pairs, don't quite fall in the same category).

In a similar way, I use the term 'formal PML node' to refer to its definition (name, supported attributes, etc.).

In that case why not just say "formally defined node". I still don't see how this qualification improves the reading though — unless there's a real need to mention this "formality" to make a clear distinction between other cases/context where the current feature being documented doesn't apply, it should be just "a node".

Formal PML nodes are defined in the PMLC source code and documented in the PML Nodes Reference Manual. Formal nodes define the actual nodes you can use in a PML document.

The [text node is in the PMLC source code but is not documented, so I guess that's an informal node, and the only one I can think of. So, according to the above definition, the occurrence of "formal node" should mean: every PML node except [text — is that the case?

You can think of a formal node as a definition or specification for a node.

I could, but I still don't see the point nor the benefits of bringing over to PML and the writers' world the semantic problems that afflict the software engineering community — most of which, BTW, exist only because of the different overlapping academic fields where these terms are used, where mathematicians, language engineers, and experts from other fields, all enforce their field-specific argot terms to describe the same things.

PML being a syntax (and not a programming language), and its users being writers and editors, it might be more beneficial to stick to down-to-earth terms — the distinction between node and tag might already be enough of a technical burden.

Here's an example of how the term "formal node" is used in the Ref Man (Lenient Parsing):

  • If a formal node has only attributes (no child nodes) the parenthesis around attributes can be omitted.

The question is how does the "formal" qualifier contribute to this sentence specifically? I underline specifically because since we rarely see this qualifier used elsewhere then we'll assume that in this context it contributes an important distinction.

I don't see how interpreting "formal node" as "a definition or specification for a node" helps me understand better lenient parsing rules — on the contrary, I might think that I can't understand to which nodes they apply until I understand the distinction between formal and informal nodes.

I'm not trying to "make an issue of small things", but I'm well aware how easy it can be for us developers to forget that our daily-work tech knowledge and jargon might not within the reach of end users, especially when dealing with a syntax which is aimed at writers, not programmers. I'm not even sure that it's safe to assume that a writer knows what a tree structure is, so terms like "node" and "attributes" might come as new challenges for someone who had a formal education in literature. E.g. from the PML User Manual:

Also, what ultimately makes any document a good and flawless reading are the small details, and the care that goes into them to avoid unneeded "bumps".

Anatomy of a PML Document

Document Tree

A PML document is a tree composed of PML nodes.

A good litmus test for the above definition would be to go to a fiction writers convention and, during the lunch break, hand them over the above three lines and ask each person how it interprets it. My bet is that unless they have studies computer data structures or semiotics they won't have the slightest clue about arborific structures and their terminology, let alone that documents can be represented as if "they were trees".

Usually writers think of documents as being organized in parts, chapters, scenes and paragraphs, and that's usually what word processors try to mimic too. The paradigm shift from the "common document model" to that of a Tree structure might require some effort and adjustment from someone who has never touched upon these topics — and, obviously, it's a required shift in order to properly understand how PML nodes work.

I'm just saying that we should keep the documentation simple, remind ourselves that it's written for the benefit of the average user, so we should avoid introducing too many technical terms, except in "to learn more about..." links for the curious or savvy. This might require us to be less formal and more forthcoming in how we write, even if this might mean longer text contents, where the exceptions are fully explained rather than relying on formalism or specialistic jargon.

But I also still fail to see the benefits of using "formal PML node".

CSS Parameters

I'm not sure to correctly understand the meaning of these three options (are they lists of paths or boolean values?)

Yes, they are all intended for paths (dirs and/or files).

I prefer to not rename CSS_files to deploy_CSS_files, because IMO (1) the typical (non-technical) user/writer probably doesn't know the meaning of 'deploy' in this context,

Even if you don't like "deploy" you could use some other term, but it's better having a fully qualified parameter than a non-qualified one — CSS_files, what about them?

True, the typical user/writer might not know what "deploy" means in this context, but when it comes to parameters it's never a guessing game either, it's about useful mnemonics after having read the --help. But at least with the term deploy end users will have some semantically pertinent expectations, unlike "Document Tree", and I would expect any English speaker to know the term.

(2) this will be the most frequently one used among these CSS options and it should therefore have a simple name, and (3) renaming would be a breaking change.

The breaking change can be avoided by keeping a deprecated alias in place until the next MAJOR release, which would actually be better than an abrupt change of a parameter name from one incarnation to the other, since it allows a grace time to adapt scripts.

To avoid the ambiguity with 'link', we could use undeployed_CSS_files (short ucss) instead of unbundled_CSS_files.

I like it, it's better than using the ambiguous "link". Probably both the terms "undeployed" and "unbundled" are puzzling for the non-tech user, but at least the former is less ambiguous to tech guys, since "bundling" calls to mind complex packaging and delivery, whereas "deploying" probably makes more sense in this context.

And for the third type of directory/file list, I suggest embedded_CSS_files instead of embed_CSS_files (short ecss).

Sure, whatever is more consistent in PMLC parameters.

pml-lang commented 1 year ago

the problem here is that PML is not aimed at software engineers but writers, so any assumptions regarding such specialized jargon are misplaced.

Indeed. Good point!

The [text node is in the PMLC source code but is not documented, so I guess that's an informal node

It is now documented (since version 3 IRC), and it's a formal node like the other ones.

Here's an example of how the term "formal node" is used in the Ref Man (Lenient Parsing):   If a formal node has only attributes (no child nodes) the parenthesis around attributes can be omitted.

The term "formal node" is used only once in the whole PML website (but it's in the User Manual, not the Ref Man). I've (locally) changed:

If a formal node has only attributes (no child nodes) the parenthesis around attributes can be omitted.

... to:

If a node can only have attributes (no child nodes) the parenthesis around attributes can be omitted.

Hence, the term "formal node" will no more be used in the next version of the PML website.

  A PML document is a tree composed of PML nodes. A good litmus test for the above definition would be to go to a fiction writers convention and, during the lunch break, hand them over the above three lines and ask each person how it interprets it. ... documents can be represented as if "they were trees".

LOL! That would be a lot of fun (just kidding).

The paradigm shift from the "common document model" to that of a Tree structure might require some effort and adjustment from someone who has never touched upon these topics.

I see what you mean. We should indeed improve the manuals to make them more understandable for non-tech people (e.g. explain technical terms in side notes, or add links to easy-to-understand additional information).

I'm just saying that we should keep the documentation simple, remind ourselves that it's written for the benefit of the average user, so we should avoid introducing too many technical terms, except in "to learn more about..." links for the curious or savvy. This might require us to be less formal and more forthcoming in how we write, even if this might mean longer text contents, where the exceptions are fully explained rather than relying on formalism or specialistic jargon.

I agree 100%. I will keep this in mind. And any PRs to improve the manuals and make them simpler are very welcome.

tajmone commented 1 year ago

A good litmus test for the above definition would be to go to a fiction writers convention and [...]

LOL! That would be a lot of fun (just kidding).

I actually meant it. This is how a simple marketing research would be carried out. Indy developers, unlike companies who have a dedicated marketing research division, have to face the problem how being in touch with their target audience.

When software developers create tools for developers it's easier, since they are members of the target audience themselves. Video game developing is easier, since we all tend to understand games and fun, but when targeting specific age groups you'd also need to carry out some on-the-ground research to get real feedback.

But when it comes to developing tools dedicated to specific fields of application or jobs, it's really important to be in touch with the target audience and receive constant feedback. E.g. I've seen many editors for novelists, and couldn't avoid noticing that those developed by non-writers Indy developers didn't seem to match the real needs and work process of fiction writers, whereas the good ones where created by developers who were married to a novelist, and the novelists which is currently considered the best one was created by a software engineer who is a fiction writer in his free time.

For some reasons, often developers chose to create tools for fields which they don't operate in, maybe because it's a good market niche, or for run, or other reasons. Another such example would be pixel art drawing tools, which is a niche small enough not to catch the interest of big corporations (like Adobe, etc.) so these tools are mostly developed by independent programmers. I was surprised to discover that in almost all cases the developers of these tools are not into computer graphics themselves, nor into drawing in general, which explains why these tools lack in features related to the drawing process and focus mainly on technical aspects of pixels rendition and manipulation — but there's much more to drawing than that.

In our case, the target users for PML probably fall into different categories, but once we have clearly identified them it would be a good idea to find a way to get in touch with them in order to have a live feedback on how they use the tool, and work with writing in general. Community feedback via product forums, Issues, etc., only gives you insights into actual users of the product, but not of the target user base.

If our target users where fiction writers (for the sake of example) then finding a way to get in contact with them would be most useful. Since novelists works independently, from home, the only way would be to go to conventions, meetings, etc., which would give us an opportunity to ask them to show us how they work on their laptops, which tools they use and how they use, and to see how they relate to PML, its documentation, etc.

Alternatively, we could try to find some online communities where they gather and share knowledge, and present PML to them and ask for feedback. But live encounters are far better IMO, since as a developer you can learn a lot by watching how a user works, how he/she struggle with interfaces, etc. Developers can see "missing things" which users might not be aware of, simply because they have never seen features available in other tools, which might be borrowed and implemented to fulfill such needs.

pdml-lang commented 1 year ago

it's really important to be in touch with the target audience and receive constant feedback In our case, the target users for PML probably fall into different categories, but once we have clearly identified them it would be a good idea to find a way to get in touch with them in order to have a live feedback on how they use the tool, and work with writing in general.

I totally agree. PML should evolve based of the real needs reported by real users. Reminds me of some good quotes by relevant people, which I mentioned in my article Fundamental Pragmatics for Successful Software Developers (also published on codeproject). Look for sub-chapter "Listen to the users!" in chapter "General Guidelines".

For example, Joel Spolsky puts it like this: "Nothing works better than just improving your product. Make great software that people want and improve it constantly. Talk to your customers (users) and listen. Find out what they need."

we could try to find some online communities where they gather and share knowledge, and present PML to them and ask for feedback.

Yes! Definitely.