pid / speakingurl

Generate a slug – transliteration with a lot of options
http://pid.github.io/speakingurl/
BSD 3-Clause "New" or "Revised" License
1.12k stars 84 forks source link

Add option to include non-latin characters #80

Open grmmph opened 8 years ago

grmmph commented 8 years ago

First of all, awesome library!

I would like to open a discussion for adding an option to support for non-latin characters. As far as I see it, 99% of browsers would parse this example correctly without messing up the route:

http://www.my-site.com/article/עוגת-בננה-זה-טעים-מאוד
http://www.my-site.com/article/香蕉蛋糕都不错
http://www.my-site.com/article/كعكة-الموز-جيدة

If there's any reason this shouldn't be allowed, I would really like to learn why.

Thanks!

leocaseiro commented 8 years ago

The speakingurl already supports Arabic and Burmese chars as well.

I believe if you send a Pull Request with support for the others languages you need, it's will be very welcome for this repo.

leocaseiro commented 8 years ago

PS: Maybe the answer #61 which mention about the limax library will be related to this issue.

grmmph commented 8 years ago

@leocaseiro This converts Arabic characters to latin characters. Not so good for seo

pid commented 8 years ago

As far as I see it, 99% of browsers would parse this example correctly without messing up the route: If there's any reason this shouldn't be allowed, I would really like to learn why.

If you don't need transliterated URLs, go with it :-) otherwise you can use speakingurl That speakingurl is available, means not, that you have to transliterate your URLs ;-)

I would like to open a discussion for adding an option to support for non-latin characters.

That's a good point, to support non-latin characters as-is, and only replace special characters. I will add this on the todo list, thanks

grmmph commented 8 years ago

Thanks @pid!

Can you think on top of your head of a Regex phrase that might do the trick?

pid commented 8 years ago

Can you think on top of your head of a Regex phrase that might do the trick? obviously... but I will check

grmmph commented 8 years ago

Perhaps something like this:

[\u0590-\u05FF\u0600-\u06FF\u0400-\u04FF\w\s\d]

This covers hebrew, arabic and Cyrillic