Parsing issue with multi-line links

spautz commented 3 years ago

Scenario

I'm using both keep-a-changelog and Prettier, with a CHANGELOG.md file, and I've hit an issue between them:

keep-a-changelog parses CHANGELOG.md and generates links at the bottom of the file, as expected. Those links are longer than the printWidth we've set for Prettier.

Prettier runs and wraps long lines -- including the links at the bottom

[2.0.0]:
https://my-very-long-url/v1.2.3..v2.0.0
[1.2.3]:
https://my-very-long-url/v1.2.2..v1.2.3

The next time keep-a-changelog parses the changelog, it doesn't recognize those links because they aren't on a single line: they're tokenized as paragraphs and indented appropriately. At that point it looks like the document has no links at all, so all h2's in the changelog get de-linked.

Possible solution

Adding our changelog to .prettierignore did avoid this, as did disabling it with proseWrap (suggested in their issue #4709), but the line-wrap is handy for the rest of the document so I'd prefer to keep it.

Locally, I was able to resolve this by having keep-a-changelog's parser recognize multi-line links:

// src/parser.js:137
if (line.match(/^\[.*\]\:$/)) {
  const nextLine = allLines[index + 1]
  if (nextLine && nextLine.match(/\s*http.*$/)) {
    // We found a multi-line link: treat it like a single line
    allLines[index + 1] = ''
    return ['link', [line.trim() + nextLine.trim()]];
  }
}

(I just split the 'link' regex into two parts, and updated the loop to .map((line, index, allLines))

That worked very well for my use case. My questions are:

Is there a better way to do this, while keeping Prettier? I played around with prettier-ignore and a few other options, but the above solution worked best for me.
Would this change work for other projects, generally? I'm happy to submit a PR if you think it'd be beneficial.

oscarotero commented 3 years ago

Hi, I didn't know multi-line links. Yep, your workaround LGTM, so I appreciate the PR. Thank you!

Just a couple of questions:

Do you think we should keep the multiline links? I guess if we don't do the nextLine.trim() (or only do a nextLine.trimEnd()) and insert a \n, we can keep it. This would prevent conflicts between prettier and this library.
The url indentation is required? If so, I'd change the regex to /\s+http.*$/ (to force at least one space).

spautz commented 3 years ago

Do you think we should keep the multiline links?

Good idea, I'll include that

The url indentation is required?

Based on the full spec at https://spec.commonmark.org/0.29/#link-reference-definitions , it looks like indentation is not required -- although Prettier always adds it. The full spec also allows link titles (which can also be multiline), but the keepachangelog.com spec doesn't seem to have or need them.

Unless there's a strong need, I think requiring indentation in the regex is easiest: it fixes the Prettier case and it reduces the risk that new lines will be matched when they weren't matched before. Supporting every multiline case would probably require using a full Markdown parser, and that feels like a ton of work with no new benefit, to me. 😄

spautz commented 3 years ago

Thanks!

oscarotero / keep-a-changelog

Parsing issue with multi-line links #16

Scenario

Possible solution