hanggrian / socialview

Android TextView and EditText with hashtag, mention, and hyperlink support
http://hanggrian.com/socialview/
Apache License 2.0
324 stars 81 forks source link

Hashtag Pattern UTF (Feature) #8

Closed jeffersonlicet closed 7 years ago

jeffersonlicet commented 7 years ago

Thanks you very much for this project.

I was taking a look into your implementation and i think that you could update the Hashtag regular expression to accept non english words etc..

Example: #(\w+) fails with #Mañana

Solution: ([##]+)([0-9A-Z_]*[A-Z_]+[a-z0-9_üÀ-ÖØ-öø-ÿ]*)

Hope to help you work on it soon. Jeff.

hanggrian commented 7 years ago

@jeffersonlicet You're very much welcome. Thanks for the regex input, I will need to create some instrument tests to make sure that pattern will work on any case.

In the meantime, do you have more test cases? I'm sorry I'm not familiar with Spanish naming customs.

jeffersonlicet commented 7 years ago

We use accent marks everywhere.

For a better internalization I allowed accent marks at the start of the word: ([##]+)([0-9A-Z_À-ÖØ-öø-ÿ]*[A-Z_]+[a-z0-9_üÀ-ÖØ-öø-ÿ]*)

I think all spanish words will work, for example:

Mañana

CreaciónDivina

TúVendrás

CuatroArtículos

MásNadaQueda

SeFueÉste

ÉsteNoEra

Ñame

You can play with it here: Regexr

hanggrian commented 7 years ago

I've added RegexTest which for some reason passes with #(\\w+). While ([##]+)([0-9A-Z_À-ÖØ-öø-ÿ]*[A-Z_]+[a-z0-9_üÀ-ÖØ-öø-ÿ]*) only captures Creación out of CreaciónDivina.

I honestly don't know why the result from Regexr can be different with junit test. Would you kindly run the test on your computer and see if it passes?

jeffersonlicet commented 7 years ago

Sorry, i used the i flag that means it's not case sensitive. Change it to: static final Pattern PATTERN_HASHTAG = Pattern.compile("(?i)([##]+)([0-9A-Z_À-ÖØ-öø-ÿ]*[A-Z_]+[a-z0-9_üÀ-ÖØ-öø-ÿ]*)");

Where (?i) disables the case-sensitive engine.

hanggrian commented 7 years ago

Okay that works but I'm thinking of using (?i)[##]([0-9A-Z_À-ÖØ-öø-ÿ]*[A-Z_]+[a-z0-9_üÀ-ÖØ-öø-ÿ]*) to prevent multiple hashtags. What do you think?

jeffersonlicet commented 7 years ago

Ohh. Yes. That's perfect.

hanggrian commented 7 years ago

Sorry for late response, version 0.12.0 has been published with that regex pattern. The patterns are also customizable now with static methods.

Thank you for your help and let me know if you have any other improvement or fix! :)