spencermountain / wtf_wikipedia

a pretty-committed wikipedia markup parser
https://observablehq.com/@spencermountain/wtf_wikipedia
MIT License
778 stars 129 forks source link

Disambiguation check fails for https://en.wikipedia.org/wiki/22 #485

Closed p-himik closed 2 years ago

p-himik commented 2 years ago

The root cause is that the pattern at https://github.com/spencermountain/wtf_wikipedia/blob/master/src/01-document/isDisambig.js#L17 doesn't have one of the spaces around (also)? within those parentheses. Meaning, "may[space]refer to" doesn't match while "may[space][space]refer to" does.

spencermountain commented 2 years ago

good catch - fixed in 10.0.1 thanks!

p-himik commented 2 years ago

Just found a false positive - https://en.wikipedia.org/wiki/Taraxacum It matches because its hatnote says:

"Dandelion" redirects here. It may refer to any species of the genus Taraxacum or specifically to Taraxacum officinale. For similar plants, see False dandelion. For other uses, see Dandelion (disambiguation)

Not sure what the best way to handle it would be. Do proper or improper disambiguation pages every have an infobox?

Update: another false positive - https://en.wikipedia.org/wiki/Introspection. Matches because the first statement of the main section says:

in a spiritual context it may refer to the examination of one's soul

In the second case, seems like checking that there are no {{About|...}} templates works. Not sure whether it could lead to any false negatives though.

spencermountain commented 2 years ago

thanks, good catch

spencermountain commented 2 years ago

thanks Eugene, both have been fixed now on 10.0.2, but please let me know if you see any others - we may find that the 'may refer to' heuristic is too sloppy, in the end. cheers