jamesaoverton / cmi-pb-terminology

CMI-PB Controlled Terminology
0 stars 0 forks source link

Regex `fullmatch` hangs on trailing whitespace in `validate.py` #66

Open beckyjackson opened 2 years ago

beckyjackson commented 2 years ago

When using a condition that matches a string with no trailing whitespace, such as match(/\S(.*\S)*/) or match(/\S([^\n]*\S)*/), if the primary_dt_condition_func using re.fullmatch is run over a string that has trailing whitespace (does not match), the process hangs here: https://github.com/jamesaoverton/cmi-pb-terminology/blob/next/src/script/validate.py#L357

I believe this is due to catastrophic backtracking and not an issue with the validation code, but I am unable to load datasets with invalid matches because of this.

In the meantime, I can use exclude(/^\s+|\s+$/) to fit my use case but it would be good to implement a workaround, as this may happen with other patterns.