jvdburgh / AmputatorBot

Remove AMP from your URLs. AmputatorBot is a highly specialised Reddit and Twitter bot that automatically replies to comments, submissions and tweets containing AMP URLs with the canonical link(s). It's also available as a website and REST API. See also: https://www.reddit.com/r/AmputatorBot/comments/ehrq3z/why_did_i_build_amputatorbot/.
https://www.amputatorbot.com/
GNU General Public License v3.0
168 stars 11 forks source link

False Positive when URL ends in 'amp' and has query params #16

Open haganbmj opened 1 year ago

haganbmj commented 1 year ago

Observed here: https://www.reddit.com/r/mtgcube/comments/103dc4d/is_it_just_me_or_do_people_seem_to_be/j2y91j1/

Then attempted to verify the behavior using https://www.amputatorbot.com/ It appears to be a false positive occurring when the url ends in amp and has query params trailing that.

Flagged:

https://scryfall.com/card/clb/870/skullclamp?utm_source=mtgcardfetcher

Not Flagged:

https://scryfall.com/card/clb/870/skullclamp
https://scryfall.com/card/clb/870/skullclamps?utm_source=mtgcardfetcher

And then interestingly this errors out with a 500 on the website:

https://scryfall.com/card/clb/870/skullclamp?
cls commented 1 year ago

Also observed here: https://www.reddit.com/r/spikes/comments/10w6odm/standard_phyrexia_all_will_be_one_whats_working/j7m673l/

Flagged:

https://scryfall.com/card/one/147/sawblade-scamp?utm_source=mtgcardfetcher

The matching substrings look to be listed in static/static.txt:

AMP_KEYWORDS = ["/amp", "amp/", ".amp", "amp.", "?amp", "amp?", "=amp",
                "amp=", "&amp", "amp&", "%amp", "amp%", "_amp", "amp_"]

So basically amp preceded or followed by one of /.?=&%_. It doesn't surprise me that there are false positives — I would expect it to at least look at both sides of the amp.