Add unicode_pair parameter.

mozilla / unicode-slugify

A slugifier that works in unicode

BSD 3-Clause "New" or "Revised" License

321 stars 52 forks source link

Add unicode_pair parameter. #12

Closed ebsaral closed 9 years ago

ebsaral commented 9 years ago

Example

 >>> slugify.sluggify(u'This is \xe9 test', unicode_pairs={u'\xe9', 'a'})
u'this-is-a-test'
instead of u'this-is-\xe9-test'

davedash commented 9 years ago

So I'm not 100% sure I understand the reasoning for this.

It almost seems like you want to do some overrides, but I'm not sure if this belongs in the slugify tool or if it should exist outside this method.

e.g. (not very efficient, but drives the point)

str = "my unicode string"
for k,v in replacement_dict.iteritems():
   str = str.replace(k, v)
slugify(str)

ebsaral commented 9 years ago

Sometimes you want the output to neglect some unicode chars but you actually wanna hold the letter with a representation because of searching comp abilities with different languages. You can see my fork as an example: https://github.com/eminbugrasaral/unicode-slugify-turkish

davedash commented 9 years ago

What happens now? They get turned to ‘-‘? It seems like your turkish replacement should be the default (in fact, I have a hard time seeing where I wouldn’t want turkish letters replaced).

What are your thoughts? I realize that changes the direction of this pull request a bit, but hopefully in a direction that benefits you.

On Sun, Jan 11, 2015 at 12:50 PM, Emin Buğra Saral notifications@github.com wrote:

Sometimes you want the output to neglect some unicode chars but you actually wanna hold the letter with a representation because of searching comp abilities with different languages. You can see my fork as an example: https://github.com/eminbugrasaral/unicode-slugify-turkish

Reply to this email directly or view it on GitHub: https://github.com/mozilla/unicode-slugify/pull/12#issuecomment-69510830

ebsaral commented 9 years ago

Type get replaced with their values in the dictionary. (â -> a. )There are more letters than the common ones so I thought if we enable an extra parameter as a pair of unicode char and its representation dictionary, people can replace any char if needed. Turkish alphabet does not have 'â', but people who typed 'a' should be able to match something like: 'âpple'

ebsaral commented 9 years ago

By the way, it's like this:

str = "my unicode string"
slug = slugify(str)
for k,v in replacement_dict.iteritems():
    slug = slug.replace(k, v)

ebsaral commented 9 years ago

We can also switch this to a boolean like 'smart_replace' to replace all common letters in latin alphabet to match ascii representations. It's up to you.