I suggest to use WEB_PROTECTED_PATTERNS and BASIC_PATTERNS by default when user does not specify protected patterns.
This allow user to avoid issues with URLs tokenization when use tokenize function with default arguments. The user can still specify different protected patterns or force to don't use protected patterns by setting protected_patterns parameter to empty list:
By default the library is not using protected patterns such of
WEB_PROTECTED_PATTERNS
which contains for example URLs and emails patterns.I suggest to use
WEB_PROTECTED_PATTERNS
andBASIC_PATTERNS
by default when user does not specify protected patterns. This allow user to avoid issues with URLs tokenization when usetokenize
function with default arguments. The user can still specify different protected patterns or force to don't use protected patterns by settingprotected_patterns
parameter to empty list: