AsciiDoc Cleanup Command

polyglot-jones commented 3 years ago

One of these days, I'll write a command to automatically clean up my ADOC files. I'll certainly make it available on demand with a key map, but I'll probably also add the option to have it run every time you save.

Enforce an exact number of consecutive blank lines above each heading (zero above the title, three above all others, except if a header immediately precedes a sub-header, then just one). Also, if there are attributes or comments just above the heading line, then the spacing goes above those. -- Actually I'll have it first scan to see what the deepest heading level used is (say, ====), then ==== lines would only be preceded by one blank line, while == and === would get 3 blank lines.
Remove trailing spaces
Add [] after http links that don't already have them
Reformat tables so that the delimiters align -- unless any one line is longer than 80(?), 120(?), then leave the whole table alone.
Convert old-style typographical curly quotes to new syntax.
What else?

tajmone commented 3 years ago

That's an interesting idea, but also consider that different people have different styling and formatting preferences when it comes to AsciiDoc.

Remove trailing spaces

That's already achievable via ST syntax-specific settings, and also via EditorConfig.

Reformat tables so that the delimiters align -- unless any one line is longer than 80(?), 120(?), then leave the whole table alone.

That's tricky, since users' can specify an alternative delimiter (e.g. in nested tables), and because of the need to check out for nested tables (which would break this type of alignment), not to mention cells and raw spanning. Also, how are you going to deal with CSVs and other types of tables? the command script would have to be very careful in detecting the right type of table, e.g. by determining the exact number of columns specified, and the absence of cells' or rows' spans.

I believe this is going to be hard to achieve without a full-fledged parser, except for very simple and predictable tables.

Many people simply write one column per line, even with short cell contents.

Convert old-style typographical curly quotes to new syntax.

Not sure what you mean here.

What else?

Definitely: splitting paragraphs one-line-per sentence! I do this via a custom RegEx when I start working on converted documents, by always supervise each match and replace, for there are always edge cases even when considering question marks and exclamation marks.

Other useful functionality would be in the area of handling footnotes and their definitions; applying consistent formatting of same constructs which allow syntactic variations, etc. (basically, a linter).

polyglot-jones commented 3 years ago

Definitely: splitting paragraphs one-line-per sentence!

Guess what? I already have such code in a BASH script. I totally forgot about that. Give me a day to convert it to Python. I'll add it to the command palette as "AsciiDoc: Clean Up Converted Prose"

Other useful functionality would be in the area of handling footnotes and their definitions

What about them?

applying consistent formatting of same constructs which allow syntactic variations, etc. (basically, a linter).

I can do that. Why don't you start a new issue with some specifics? Start with how you would add it to the command palette. (My suggestions would be: "AsciiDoc: Standardize Syntax" -- assuming I understand you correctly.)

tajmone commented 3 years ago

Other useful functionality would be in the area of handling footnotes and their definitions

What about them?

E.g. converting inline footnotes to externalized footnotes. I personally find this method very useful in books with many footnotes for it keeps the inline notation shorter and place all footnotes definitions at the top of the document.

:fn1: footnote:[Some note.]

Some text.{fn1}

Other possible functionality would be to process all footnotes and assign them an auto-generated target ID, e.g. with progressive numbering (replacing any previous target ID found).

footnote:[First note.] → footnote:fn1[First note.]
footnote:old_id[Second note.] → footnote:fn2[Second note.]

Again, enforcing a consistent formatting, notation and/or methodology.

Why don't you start a new issue with some specifics? Start with how you would add it to the command palette.

I don't really have any specifics in mind, and probably the suggested way to go about it proposed by the contributor is most likely to be better than any suggestion from my side, since the former would be the result of practical needs based on day to day experience.

(My suggestions would be: "AsciiDoc: Standardize Syntax" -- assuming I understand you correctly.)

I would personally suggest creating many individual functions, which end users are then free to either use one after the other, or to group into custom single commands they create on a per need base. The problem is that there isn't a standard syntax when it comes to AsciiDoc, there are multiple choices instead, and personal approaches.

When it comes to creating an AsciiDoc linter, I think the best approach would be to create a dedicated linter for an already existing linter package, like SublimeLinter, rather than include it in this package:

I'm convinced that it's very hard to handle the AsciiDoc syntax without a proper parser that creates an AST, and I think that the only viable solution would be to create a Language Server package for AsciiDoc, which would not only be portable across all editors that support LSP, but also allow tailoring all sorts of functionality into the package as well (linting, refactoring, etc.).

But that would be quite hard to achieve, for such an LSP package would have to be fast enough to allow real-time highlighting while editing, and also be a "fault tolerant" parser which can recover from incomplete constructs without breaking the whole document (which is the main problem with the current ST package, which often breaks up mid-document and is unable to properly highlight and index the rest of the doc).

polyglot-jones commented 3 years ago

OK guys, check out pull request #14. Start by reading FIXUP_CONVERTED.adoc. How'd I do?

As you can see, this new command is not exactly as originally proposed above (something to be run repeatedly). Instead, the ideas is to run it just once, right after a document has been converted in to AsciiDoc.

FYI: This is based on a bunch of shell scripts that I used while publishing three books for my friends. I didn't just write it all from scratch.

polyglot-jones commented 3 years ago

E.g. converting inline footnotes to externalized footnotes.

I added issue #15 for this.

process all footnotes and assign them an auto-generated target ID

Would that be in addition to the externalized method, or instead of? I think the externalized method is perfectly elegant.

I would personally suggest creating many individual functions

OK. I see that. I went ahead and batched up 20 operations in the one FIXUP_CONVERTED command because they really go together and it's a one-and-done. But going forward, I agree with the individualized approach.

When it comes to creating an AsciiDoc linter, ...

All good points. I'm moving your comments on this to its own issue for further discussion after we close this one.

tajmone commented 3 years ago

What if the linter only promised to handle a small subset of AsciiDoc? -- i.e. it justs looks for the most common lint.

I think that this would be a good idea, also more realistic to achieve than a "full linter" (i.e. claiming to cover every possible AsciiDoc construct and context).

I've been thinking of putting forth the notion of establishing an "AsciiDoc-Lite" standard.

That could be beneficial and, as long as its goals are clearly stated to be an "opinionanted standard" for practical use, which doesn't claim to be official nor an alternative to the AsciiDoc standard efforts. Probably calling it a convention rather than a standard would be better (or wiser) — I'm sure that such a project, or calling it "AsciiDoc standard" would rise criticism, just like it happens whenever you mention terms like "GitHub-Flavored AsciiDoc".

You could start a repository proposing a draft for this convention, so that AsciiDoc users might join the discussion with their proposals and needs, in order to find out what minimum coverage of the syntax would be considered useful, how it would play out nicely with all available AsciiDoc implementations, etc.

I'm convinced that less than 5% of AsciiDoc notation handles more than 95% of the use cases out there.

That's a good estimate IMO.

Having such a known-quantity specification could make life easier on everyone (developers and users).

I'm not sure about "everybody", since some authors (especially old timers coming from Emacs and Vim) tend to our their own strongly established traditions when it comes to formatting (e.g. ALWAYS two spaces separating sentences on a same line, etc.), and because opinionated choices are unlikely to make everyone happy (e.g. should the "one-sentence-per-line" rule also apply to list items? to a sentences like "Sentence. E.g.", and so on).

But it is in an interesting challenge, especially when it comes to non-English documentation, since other locales have their own formatting needs (which were not always taken into account during AsciiDoc general design).

tajmone / ST4-Asciidoctor

AsciiDoc Cleanup Command #10