syntax-tree / mdast-util-gfm-autolink-literal

mdast extension to parse and serialize GFM autolink literals
https://unifiedjs.com
MIT License
8 stars 6 forks source link

Parse `www.` won't return positions #6

Closed ocavue closed 1 year ago

ocavue commented 1 year ago

Initial checklist

Affected packages and versions

mdast-util-gfm-autolink-literal@1.0.2

Link to runnable example

No response

Steps to reproduce

Run the following script:

import { fromMarkdown } from 'mdast-util-from-markdown';
import { gfmAutolinkLiteral } from 'micromark-extension-gfm-autolink-literal';
import { gfmAutolinkLiteralFromMarkdown } from 'mdast-util-gfm-autolink-literal';

function log(text) {
  let tree = fromMarkdown(text, {
    extensions: [gfmAutolinkLiteral],
    mdastExtensions: [gfmAutolinkLiteralFromMarkdown],
  });
  let tokens = tree.children[0].children;
  console.log(`parsed result for "${text}":`)
  console.dir(tokens, { depth: null });
}

log('www.example.com.');
log('www.');

This script will output the following content:

parsed result for "www.example.com.":
[
  {
    type: 'link',
    title: null,
    url: 'http://www.example.com',
    children: [
      {
        type: 'text',
        value: 'www.example.com',
        position: {
          start: { line: 1, column: 1, offset: 0 },
          end: { line: 1, column: 16, offset: 15 }
        }
      }
    ],
    position: {
      start: { line: 1, column: 1, offset: 0 },
      end: { line: 1, column: 16, offset: 15 }
    }
  },
  {
    type: 'text',
    value: '.',
    position: {
      start: { line: 1, column: 16, offset: 15 },
      end: { line: 1, column: 17, offset: 16 }
    }
  }
]
parsed result for "www.":
[
  {
    type: 'link',
    title: null,
    url: 'http://www',
    children: [ { type: 'text', value: 'www' } ]
  },
  { type: 'text', value: '.' }
]

Expected behavior

Parsing links should return text and link tokens with position property.

Actual behavior

Parsing www. didn't return position property in the text token and link token.

Affected runtime and version

node@16.17.0

Affected package manager and version

No response

Affected OS and version

No response

Build and bundle tools

No response

ocavue commented 1 year ago

It seems that mdast-util-gfm-autolink-literal use mdast-util-find-and-replace to handle "www.". I'm not sure if this this case is an expected limitation from mdast-util-find-and-replace.

wooorm commented 1 year ago

See https://github.com/remarkjs/remark-gfm/issues/16. This is currently impossible and expected. GH does this as a transform afterwards too.

I’m currently working on a markdown parser in a different language. There I did implement it as a sort of transform still “on the parser”, which means positional info is there. I am planning to at some point port these things back. But currently I think this is currently expected.

ocavue commented 1 year ago

Thanks for your explanation.

wooorm commented 1 year ago

Closing this for now. It’s currently expected. I will likely devise something in the future though!

github-actions[bot] commented 1 year ago

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.