Censor rules should run in order of decreasing length

mysociety / alaveteli

Provide a Freedom of Information request system for your jurisdiction

https://alaveteli.org

Other

389 stars 196 forks source link

Censor rules should run in order of decreasing length #358

Open bjh21 opened 12 years ago

bjh21 commented 12 years ago

At the moment, if I'm trying to remove a user's name from their requests, I might start by creating a censor rule for "Joe Bloggs", and then one for "Mr Bloggs" and then one for "Bloggs", as I find different ways the name might be written. This turns out not to work, though, because the "Bloggs" rule bites first, leaving "Joe [name removed]", which the first rule then ignores. This can be worked around by careful ordering of rules, but I think it would be better if rules were applied in order of decreasing length, so that "Joe Bloggs" and "Mr Bloggs" would be replaced before "Bloggs". This would generally have the right behaviour, and would mean that the order of creation of censor rules wouldn't matter.

hsenag commented 12 years ago

That's not strictly true as overlapping rules might still go wrong (and ordering by length would remove the possibility of using ordering to get the desired behaviour). E.g. "wibblewobble", with rules for "wibble", "wobble" and "ibblewobbl", is still a problem. I can't think of a plausible example of how this could actually bite in the wild, though!

bjh21 commented 12 years ago

That's why I said "generally". I think in your case, the problem could be worked around by defining another rule for "wibblewobble", which would apply the correct censorship for both "wibble" and "wobble". The censor rule system is intrinsically imperfect, so I think making it more predictable at the expense of flexibility is a sensible move.

hsenag commented 12 years ago

Fair enough. No doubt there's a completion algorithm one could use in general to find the overlappping rule :-)

garethrees commented 3 years ago

Linking to https://github.com/mysociety/alaveteli/issues/2761 – a slightly different take.