Open bionicles opened 9 months ago
follow up: for easy wins in pandas performance, a review of all the logical conditions for warning in pandas might be warranted, if we're compiling stuff just to check if we ought to warn people, then we could remove these warnings where possible to speed everything up and simplify the codebase
Thanks for the report. There can be a performance hit on the actual operation when using capturing vs non-capturing groups however. I think this might be another good case for #55385.
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
@phofl wrote re: PR 56763 to remove this warning
Sorry, my bad, here is an issue to discuss the warning I proposed to remove.
PRE-TL;DR:
result of running reproducible example:
This warning is annoying and serves to discourage use of named capturing groups (which are more maintainable) because users must either switch to extracting the groups (not always necessary) or replace their named groups with "(?:" (noncapturing groups are harder to maintain because it's less clear what is their objective within a greater regex pattern).
If users need to specialize their regex patterns to each command, then they need to maintain multiple copies, some with non-captured groups, some without, just to silence some warning, also, if they remove the groups, then later on when they want to use them, they might have to figure out how to replace groups they removed just to silence a warning, and be frustrated.
The logical condition for the warning also compiles the pattern but then the compiled pattern is discarded, so this warning slows down every "contains" in pandas just to check if we should annoy people who probably know what they're doing.
If we remove this unnecessary warning, then we no longer discourage users who use named capturing groups, thus facilitating readability of the patterns, and portability to other contexts, such as debuggers or the "extract" method mentioned in the removed warning.
TL;DR: This warning doesn't need to exist, discourages best practices, and slows performance of every string contains query in pandas (a super "hot path!"), so I suggest we remove it.
here is a permalink to the line of code to check for the warning condition
just compile the pattern and use the compiled pattern:
Expected Behavior
no warning for containment queries using groups
Installed Versions