salesking / sepa_king

Ruby gem for creating SEPA XML files
MIT License
149 stars 118 forks source link

Erroneous regular expressions #94

Closed sos4nt closed 3 years ago

sos4nt commented 3 years ago

I've noticed errors in two regular expressions used in the validators:

In SEPA::CreditorIdentifierValidator:

REGEX = /\A[a-zA-Z]{2,2}[0-9]{2,2}([A-Za-z0-9]|[\+|\?|\/|\-|\:|\(|\)|\.|,|']){3,3}([A-Za-z0-9]|[\+|\?|\/|\-|:|\(|\)|\.|,|']){1,28}\z/
#                                             ^   ^  ^  ^  ^  ^  ^  ^  ^ ^
#                                             A   B  B  B  ...

Here, | is used as a meta-character for alternation to combine character classes (A) which is correct. But | is also used within a character class to combine its characters (B) which is wrong. Within a character class, | is treated as a literal character. You can't use it as a meta-character here.


In SEPA::MandateIdentifierValidator:

REGEX = /\A([A-Za-z0-9]|[\+|\?|\/|\-|\:|\(|\)|\.|\,|\']){1,35}\z/

• Again, | inside the character class (| is not allowed per spec) • The "space" character is missing, which according to the spec is a valid character


Apart from the above errors, the regular expressions seem overly complex and are quite hard to read. For example, instead of defining two separate character classes which are joined via |, you can define the allowed characters in one character class which allows you to omit the capture group. You can also remove all of the escape characters by using %r|...| and by moving - to the end. Applied to MandateIdentifierValidator's regex:

REGEX = %r|\A[A-Za-z0-9 +?/:().,'-]{1,35}\z|
ledermann commented 3 years ago

Thanks for this great input, @sos4nt! I've made a fix in #95, what do you think?