icann / rfc-annotations

Other
8 stars 7 forks source link

proposed regexps to match contents of "@@" pairs #57

Closed reschke closed 2 years ago

reschke commented 2 years ago

Plan:

  1. find pairs of "@@"
  2. match contained text against three regexps, sequentially
  3. regexp groups will give us "Section" vs "Line", Section number of line number, and RFC number
  4. if nonmatch, continue (potentially using second "@@" as start)

The regexps match a sequence of groups; for each I'll specify the generated HTML.

Regexps, tested at https://regex101.com/:

"((Section|Appendix|Line)\s*([0-9A-Z\.]+))(\s*(of|in)\s*\[?)(RFC\s*([0-9]+))(\]?)"gmi

https://regex101.com/r/ZHUsaQ/3

Extract:

Compute:

Generate:

<a href="$section-or-line-target" _target="_blank">$section-or-line-text</a>$plain-text-1<a href="$rfc-target" target="_blank">$rfc-text</a>$plain-text-2
"(\[?)(RFC\s*([0-9]+))(\]?,\s*)((Section|Appendix|Line)\s*([0-9A-Z\.]+))"gmi

https://regex101.com/r/KqSWTL/2

Extract:

Generate:

$plain-text-1<a href="$rfc-target" target="_blank">$rfc-text</a>$plain-text-2<a href="$section-or-line-target" _target="_blank">$section-or-line-text</a>
"(\[?)(RFC\s*([0-9]+))(\]?)"gmi

https://regex101.com/r/THdxye/2

Extract:

Generate:

$plain-text-1<a href="$rfc-target" target="_blank">$rfc-text</a>$plain-text-2
mboe commented 2 years ago

ok, I'll try to implement these

reschke commented 2 years ago

and, if possible, hyperlink the section/line number separately from the RFC number...

mboe commented 2 years ago

the PR https://github.com/icann/rfc-annotations/pull/58 should contain all necessary changes

mboe commented 2 years ago

@reschke please close if current implementation is sufficient or add infos otherwise. Thanks.

reschke commented 2 years ago

Updates to use named groups:

https://regex101.com/r/ZHUsaQ/4 https://regex101.com/r/KqSWTL/3 https://regex101.com/r/THdxye/3