microsoft / tsdoc

A doc comment standard for TypeScript
https://tsdoc.org/
MIT License
4.68k stars 131 forks source link

Add line wrapping / breaking to emitter #200

Open iansan5653 opened 4 years ago

iansan5653 commented 4 years ago

Currently, the emitter seems to have no option or support for breaking lines at a certain length. If a parameter description has a 500-character block of text, all of that text will be on the same line. It would be really useful to have some sort of parameter or setting to automatically wrap text when a certain number of characters is reached in a line.

octogonz commented 4 years ago

👍 Yes, we should definitely implement this! Doing so would enable a tool like API Extractor to normalize the comment format when it writes a .d.ts file for release.

Another idea would be to emit the DocNodeKind.SoftBreak as a newline. If I remember right, the reason this wasn't implemented is that trimSpacesInParagraphNodes() discards the soft breaks when it is normalizing the text. We would need to pass them along somehow as hints to the emitter.

octogonz commented 4 years ago

When I started working on this, I realized that the emitter should probably support three separate modes:

  1. verbatim: Emit the AST "as-is" without any reformatting. This would be used e.g. by a refactoring tool that wants to rename a @param name without disturbing anything else. It should perhaps be the default. It's simple to support since the AST already captures all whitespace; we simply need to disable the trimSpacesInParagraphNodes() transform.

  2. trim spaces: This is the current behavior, where unnecessary newlines are discarded, and the lines are unwrapped. The main value is that it makes it easier to emit Markdown correctly, since Markdown engines tend to misinterpret extra spaces/newlines. But when emitting .ts comments (instead of .md files), it produces ugly long lines as you pointed out.

  3. trim spaces and word wrap: This would add some extra logic to trimSpacesInParagraphNodes() that re-wraps the paragraphs to a specified column. It could be most useful for a code prettifier, or to prettify generated output.

@iansan5653 I'm wondering, would verbatim be more appropriate for your application than trim spaces and word wrap?

octogonz commented 4 years ago

Also note that word-wrap probably cannot be applied to other sections such as DocCodeSpan or (in the future) markdown headings, unless we introduce a comment-wrapping operator like proposed in RFC #166.

iansan5653 commented 4 years ago

would verbatim be more appropriate for your application than trim spaces and word wrap

I don't think so, unless I misunderstand what you're asking - I'd like to make a tool that, no matter how the input content is formatted, always produces the same standardized output. Lines wrapped, tags in the same order, newlines where they should be and not where they shouldn't, etc.

octogonz commented 4 years ago

I'd like to make a tool that, no matter how the input content is formatted, always produces the same standardized output. Lines wrapped, tags in the same order, newlines where they should be and not where they shouldn't, etc.

👍 Got it. I'll see if I can implement this. I started work on it over the weekend, but ran into some deeper architectural questions that I need to think about before I write too much code.

rbuckton commented 4 years ago

I don't think this comment is entirely accurate:

[...] Markdown engines tend to misinterpret extra spaces/newlines. [...]

Markdown doesn't "misinterpret" extra spaces/newlines. Rather, Markdown is a whitespace-significant language and has very specific behavior with regards to the number of whitespaces and newlines that it encounters. Here are a few common examples:

In my opinion, the best thing to do is to parse out the TSDoc specific syntax (@ tags and {} inlines, etc.) and trim the leading * from each line in a doc comment, but preserve the rest essentially verbatim.

You can find the latest specification for commonmark (the Markdown spec that Github-Flavored Markdown is based on) here: https://spec.commonmark.org/0.29/

rbuckton commented 4 years ago

Note that whitespace is also significant inside a pullquote:

> line 1
>
>     code
>
> line 2

line 1

code

line 2

octogonz commented 4 years ago

In my opinion, the best thing to do is to parse out the TSDoc specific syntax (@ tags and {} inlines, etc.) and trim the leading * from each line in a doc comment, but preserve the rest essentially verbatim

@rbuckton We started with this idea. However, it conflicts with two of TSDoc's overarching goals:

I originally thought that CommonMark would address these concerns. But it doesn't. CommonMark has many gotchas where an expression gets parsed unexpectedly. And as a unifying "standard", CommonMark turned out to be a standard that nobody actually implements: Every single Markdown engine adds its own proprietary grammar extensions that, when used, can cause an entire input to be misinterpreted by other engines.

As evidence, consider this code:

| Col1  | Col2 |
| --- | --- |
| `{@link X | Y}` |
| {@link X | Y}  |

How it gets parsed:

(There are endless examples like this. If you put ``` on the first line, some engines treat the whole file as code, others treat the whole file as not code.)

TSDoc's concern: Is there a @link tag in this comment or not?

In the above situation, this question has no clear answer. We considered mitigating this by modeling TSDoc as a preprocessor, that grabs its tags in a simple-minded way, and then passes along the remaining content uninterpreted, with no attempt at consistency between tools. But even if consistency doesn't matter (I believe it does), we found that the resulting grammar was highly counterintuitive. It wasn't a pleasant authoring experience.

Thus, after a very long discussion we came to the opinion that TSDoc should have its own "TSDoc-flavored-Markdown" with the following properties:

In practice this has worked very well. I still haven't gotten around to adding basic features like boldface, headers, bullets, etc. -- which we do want to support -- but already people have written A LOT of very good documentation with relatively few complaints about missing constructs. Doc comments embedded in source code really don't need a whole lot of bells and whistles, it seems.

So, to recap: When you use API Documenter for example, your TSDoc-flavored-Markdown gets fully parsed into an AST. Later, when the MarkdownEmitter writes the .md/.yml output file, it is very thoroughly escaped to ensure the emitted Markdown correctly captures TSDoc's interpretation. We really do not want any Markdown extensions to work unless they are part of the TSDoc-flavored-Markdown grammar.