trinker / qdapRegex

qdapRegex is a collection of regular expression tools associated with the qdap package that may be useful outside of the context of discourse analysis.
50 stars 4 forks source link

qdapRegex::rm_nchar_words returns different results when non English letters involved? #29

Closed stevesolun closed 2 years ago

stevesolun commented 5 years ago

https://stackoverflow.com/questions/56546973/qdapregexrm-nchar-words-returns-different-results-when-non-english-letters-inv#comment99676381_56546973

Please help me with the following confusion:

qdapRegex::rm_nchar_words("è ûé", "1,2") [1] "è ûé"

qdapRegex::rm_nchar_words('k ku ppp d', "1,2") [1] "ppp" Why in the first code line it doesn't respond with "" but in the second one it works as expected. What do I miss here? The only thing I can think that in the first line of code the string is built from non English letters.

Any solution?

trinker commented 5 years ago

It uses \w to define letters which is defined as [A-Za-z0-9_]. You would need to write your own custom regex to handle the non-ascii letters

stevesolun commented 5 years ago

@trinker but you may observe that it works for others in the above stackoverflow question (akrun says it works for him).

trinker commented 2 years ago

Thanks for bringing to my attention. This is not added per your suggestion on stackoverflow