serokell / xrefcheck

Check cross-references in repository documents
Mozilla Public License 2.0
51 stars 3 forks source link

Try to implement copy-paste protection checks #64

Open Martoon-00 opened 3 years ago

Martoon-00 commented 3 years ago

Clarification and motivation

Imagine the following list of links:

It is easy to make a mistake here during copy-pasting so that text is updated and the link is not. I think we can use some heuristics to spot such mistakes (but avoid false positives at all costs):

then report an error at [T2](L1) position, mentioning that it could be a bad copy-paste of [T1](L1). And a similar check for [T1](L2).

Acceptance criteria

Martoon-00 commented 1 year ago

If possible, we should check reference-style links ([text][link-id]) too.

However not sure how much is it possible, AFAIR such links are automatically inlined by our markdown parser.

YuriRomanowski commented 1 year ago

Do we want to check only links within one list? How about checking all the links within a given file?

Martoon-00 commented 1 year ago

That's a good question. On the one hand, this increases the probability of getting a false positive. On the other, checking through the entire file may be more useful and will be a more transparent behaviour for the user.

Let's really go with checking across the entire file.

Over time we will collect some statistics on how this check works on real-life repositories and will revise the behaviour then.