Closed phHartl closed 4 years ago
Since the given example does not cover every special case of paragraph numbers not being precedented by a full stop, a generic regular expression should be used. It improves the consistency of the data, while also increasing perfomance with a negligible increase in paragraph numbers remaining in the text.
Proposal:
(?<=[.’']\s)(\d+\.*\s)(?=[A-Z])
Translation: Query digits followed by other digits, a dot (optional) and a space. Ensure it is followed by a capital letter and precedented by a full stop or single quote followed by a space.
Best bet atm:
https://regex101.com/r/2AgRRW/1