unt-libraries / pycallnumber

Parse, model, and manipulate any type of call number string.
BSD 3-Clause "New" or "Revised" License
64 stars 9 forks source link

Issue with regexes in Python 2.7.5 causes tests to fail #15

Closed jthomale closed 6 years ago

jthomale commented 6 years ago

While working on #14, I found another Python 2.7.5 bug. 40 tests fail while a Unit's template is trying to compile a regular expression, with the error:

error: nothing to compile

Here's an example of a regex that leads to this error:

(?P<parts>(?:[A-Za-z]+|[^A-Za-z0-9]*)+)

From what I understand, the problem is that the expression in the innermost parentheses could lead to a non-match, yet it's followed by a + which requires a match. I think Python versions 2.7.6 and later allow this, while 2.7.5 doesn't. (These tests only fail on 2.7.5.)

All failing tests are for Units AlphaSymbol, NumericSymbol, and AlphaNumericSymbol, all CompoundUnit types that use the simple Formatting unit as a component. I've traced the underlying problem to the fact that the Formatting unit allows 0 or more matches. In most call numbers formatting is treated as optional, so originally this seemed desirable. But, in retrospect, having it match nothing by default seems counterintuitive—and unnecessary, since you can still set formatting components as optional on an individual basis. Changing the default min_length from 0 to 1 does solve the nothing to compile error, but it breaks some of the tests testing the current behavior. Fixing it will be a matter of untangling that web and making sure none of the more complex callnumber types rely on that behavior to function.