openstates / issues

Having trouble? Looking to contribute? Issues live here!
15 stars 2 forks source link

Should the bill action categorizer default to case insensitive? #1243

Open showerst opened 1 month ago

showerst commented 1 month ago

We sometimes run into issues where bill actions using the BaseCategorizer class stop matching because the state changes the case of their actions. (e.g. "Passed" vs "passed") https://github.com/openstates/openstates-scrapers/pull/5060/files

Is there any reason not to make this default to being case insensitive, with an allowed override to turn it off?

I feel like there's one state where ALL CAPS actions are one chamber and all lower are another, but that shouldn't effect the action classifier.

@jessemortenson tagging you for feedback since it's a global scraper change, in case you know of a reason not to do it. Otherwise i'll change that class to default to [re.I] and allow for flags to be passed to override.

jessemortenson commented 1 month ago

Hmm good question. It does seem like at least 99% of the time that case insensitivity makes sense. Glancing through regex strings passed into Rule instances, there is a lot of cased-text, and a number of cases where someone included a (?i) prefix to specify case-insensitivity. So the current pattern seems to be assuming case sensitivity (but specifying otherwise in the regex if necessary).

But I also haven't thought about this part of the system for a while - @alexobaseki does this ring any warning bells for you?

I don't think I have a particularly strong opinion between switching the global default and continuing to sprinkle in (?i) liberally in client code. But if we do switch globally, probably makes sense to remove the instances of (?i) so that isn't confusing to folks in the future.

alexobaseki commented 1 month ago

I don't know any red flags for changing the global default to case-insensitivity.

jessemortenson commented 1 month ago

OK, that's probably a thumbs up from our side then :)