nodemules / wiki-hole

0 stars 0 forks source link

better parsing for invalid links #1

Open brenthaertlein opened 6 years ago

brenthaertlein commented 6 years ago

Language links especially are still tricky.

Currently if the links are "inside" parentheses, but not "outside" parentheses, they are flagged as a "language link" and excluded.

https://en.wikipedia.org/wiki/Cambodia has a (listen) inner parentheses that violates this rule and causes Khmer to be registered as the first valid link

johngreedjr commented 6 years ago

see id 22939