wikimedia-gadgets / afc-helper

A tool for reviewing Articles for Creation submissions on the English Wikipedia
https://en.wikipedia.org/wiki/Wikipedia:AFCH
GNU General Public License v3.0
36 stars 80 forks source link

fix regex catastrophic backtracking #269

Closed NovemLinguae closed 11 months ago

NovemLinguae commented 1 year ago

Fix #245

This new code is vulnerable to deleting the wrong heading if someone puts a category in the wrong place (not at the bottom of the article), but I think that's an acceptable tradeoff for now. If it actually affects someone we can make the solution more complex in a future patch.

NovemLinguae commented 1 year ago

Note to self. Algorithm idea to fix this last case ("someone puts a category in the wrong place (not at the bottom of the article)"). Will code this up later and add to patch (and will add some more test cases):


let textBetweenFirstCategoryAndEndOfFile = wikitext.match(/\[\[:?Category:.*$/);
// delete categories from sampled text
textBetweenFirstCategoryAndEndOfFile = textBetweenFirstCategoryAndEndOfFile.replace(/\[\[:?Category:[^\]]+\]\]/g, '');
// does the non-category sample text have anything except whitespace?
let hasNonWhitespace = textBetweenFirstCategoryAndEndOfFile.match(/\S/);
if ( hasNonWhitespace ) {
    return;
}
NovemLinguae commented 1 year ago

Note to self: maybe I can fix this by tweaking the existing regex. Look into some of the ideas in this article, in the "Possessive Quantifiers and Atomic Grouping to The Rescue" section:

https://www.regular-expressions.info/catastrophic.html

NovemLinguae commented 12 months ago

OK, I rewrote this and solved all the issues. Ready for review. This alg should be identical to the old alg, but is iterative instead of using regex, so no catastrophic backtracking problems.