gosimple / slug

URL-friendly slugify with multiple languages support.
Mozilla Public License 2.0
1.15k stars 108 forks source link

No slug-representation of emojis/pictograms #49

Open remnestal opened 4 years ago

remnestal commented 4 years ago

This seems like a silly use case at first, but when creating a slug from a string containing emojis or pictograms, there is no representation of those characters. For example:

slug.Make("🐛")
slug.Make("☺")
slug.Make("𝕗𝕒𝕟𝕔𝕪 𝕥𝕖𝕩𝕥")

all yield empty strings.

I'm not sure how such a character would best be represented in a slug, but simply removing them could be problematic in some cases. Is this intentional?

remnestal commented 4 years ago

@dalu Let's say I have a blog platform where I let my customers set the title of their posts. I want the title of their posts to be turned into a slug for the URL. For example, let's say there's a post titled "No slug-representation of emojis/pictograms", then I expect the URL to look something like example.com/posts/no-slug-representation-of-emojis-and-pictograms. No problem.

But let's then say that a user has created two posts, whose titles contain more than just the "standard" ascii characters:

Then both those blog posts would have the slug "", which is problematic. Don't focus too much on the :bug:-emoji in my previous example, there's lots of unicode not covered by this package that can make URLs collide.

I realize that there's is no obvious solution to this problem, in fact I said so in the last sentence of my original post, but forcing every platform to implement a huge custom substitution map for all of these characters is hardly a satisfying solution

alex-dodich commented 3 years ago

Facing same problem and its looks like there are no solution to make slug from any forbidden symbol :( For this case you can do something like this:

func createSlug(title string) string {
        // generate non empty slug
    pSlug := slug.Make(title)
    if pSlug == "" {
        pSlug = "untitled"
    }

        // add "random" part to keep slug unique
    return fmt.Sprintf("%s-%d", slug.Make(title), time.Now().Nanosecond()/1000)
}
matrixik commented 3 years ago

Thank you for this report and sorry it took so long, burnout is not nice...

So first: https://github.com/rainycape/unidecode that slug package is using underneath have test showing that it is stripping emojis:

https://github.com/rainycape/unidecode/blob/cb7f23ec59bec0d61b19c56cd88cee3d0cc1870c/unidecode_test.go#L30-L33

I forked it to https://github.com/gosimple/unidecode It's true that it's missing a lot of characters that could be properly converted to ASCII and everyone are welcome to provide more updates (I'll also merge at some point additions from forks, like https://github.com/cuilun/unidecode).

Second: from the beginning I designed slug to be on the safe site and, for example, I also used it for generating file names so chars like / should not be in the output.

Third: I will not change default behavior (I don't want to break anyone) but it's possible to add some flag like AllPrintableASCII by default set to false (to allow all chars from https://en.wikipedia.org/wiki/ASCII#Printable_characters - but space will be still replaced with -).

Or maybe just export:

https://github.com/gosimple/slug/blob/a0807d1a7cb54c4ab98007090109c9574a68a91c/slug.go#L35

so everyone could configure it themselves? I'm open to your ideas.