yuin / goldmark

:trophy: A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured.
MIT License
3.68k stars 255 forks source link

Provide a way to get position information for link/image destinations #138

Closed cespare closed 4 years ago

cespare commented 4 years ago

Hey, thanks for this module. So far it has been very useful and works well for me.

I'm trying to figure out how to get position information out of the parsed AST for link destinations. I think it's not possible, but I'd love to be corrected.

At the bottom I attached an example program which does something similar to what I want, but not quite the same. The idea is: parse a markdown document and print out the starting position (line, column) of each link or image destination. So if the first line of the document is

[link1](/abc)

it should print

1:9: /abc

because the destination (/abc) starts at column 9 of line 1.

I don't think the destination position is preserved in the parsed AST. Is that correct?

The closest I'm able to get is the link text position, which is itself a bit difficult to extract. My demo code walks down into the link until it finds the first *ast.Text node, and then assumes that the position one byte before the beginning of the text is the opening [ of the link. So here my demo code prints 1:1 rather than 1:9.

As a secondary, less important issue, it would be nice to be able to easily jump from any node to its starting position in the source text. (For comparison, consider token.Pos in the go/token package: every node in the AST has at least one token.Pos associated with it, and from a token.Pos plus the input file you can get back to a line/column.)

To provide some context, my use case here is to create a linting tool for internal markdown documents. For example, I'd like to flag any relative link inside a document which points to a nonexistent file.

Thanks for your consideration of these issues!

package main

import (
    "fmt"

    "github.com/yuin/goldmark"
    "github.com/yuin/goldmark/ast"
    "github.com/yuin/goldmark/text"
)

const input = `
[link1](/abc)
[link2][1]
[link2]

![image](def.png)

[1]: /ghi
[link2]: /jkl
`

func main() {
    source := []byte(input)
    root := goldmark.DefaultParser().Parse(text.NewReader(source))
    ast.Walk(root, func(n ast.Node, entering bool) (ast.WalkStatus, error) {
        if entering {
            walk(source, n)
        }
        return ast.WalkContinue, nil
    })
}

func walk(source []byte, n ast.Node) {
    var dest string
    switch n := n.(type) {
    case *ast.Link:
        dest = string(n.Destination)
    case *ast.Image:
        dest = string(n.Destination)
    default:
        return
    }
    line, col := lineColForNode(source, n)
    fmt.Printf("%d:%d: %s\n", line, col, dest)
}

func lineColForNode(source []byte, node ast.Node) (line, col int) {
    line, col = -1, -1
    ast.Walk(node, func(n ast.Node, entering bool) (ast.WalkStatus, error) {
        if entering {
            if text, ok := n.(*ast.Text); ok {
                line, col = lineColFromOffset(source, text.Segment.Start-1)
                return ast.WalkStop, nil
            }
        }
        return ast.WalkContinue, nil
    })
    return line, col
}

func lineColFromOffset(source []byte, offset int) (line, col int) {
    line = 1
    col = 1
    for i := 0; i < offset; i++ {
        if source[i] == '\n' {
            line++
            col = 1
        } else {
            col++
        }
    }
    return line, col
}
yuin commented 4 years ago

I don't think the destination position is preserved in the parsed AST. Is that correct?

Correct.

As a secondary, less important issue, it would be nice to be able to easily jump from any node to its starting position in the source text. (For comparison, consider token.Pos in the go/token package: every node in the AST has at least one token.Pos associated with it, and from a token.Pos plus the input file you can get back to a line/column.)

This makes a sense. I'm going to go along with this way. I'll give it a try later.

All nodes will have a *text.Segment .

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.