Open remnestal opened 4 years ago
@dalu Let's say I have a blog platform where I let my customers set the title of their posts. I want the title of their posts to be turned into a slug for the URL. For example, let's say there's a post titled "No slug-representation of emojis/pictograms", then I expect the URL to look something like example.com/posts/no-slug-representation-of-emojis-and-pictograms
. No problem.
But let's then say that a user has created two posts, whose titles contain more than just the "standard" ascii characters:
Then both those blog posts would have the slug ""
, which is problematic. Don't focus too much on the :bug:-emoji in my previous example, there's lots of unicode not covered by this package that can make URLs collide.
I realize that there's is no obvious solution to this problem, in fact I said so in the last sentence of my original post, but forcing every platform to implement a huge custom substitution map for all of these characters is hardly a satisfying solution
Facing same problem and its looks like there are no solution to make slug from any forbidden symbol :( For this case you can do something like this:
func createSlug(title string) string {
// generate non empty slug
pSlug := slug.Make(title)
if pSlug == "" {
pSlug = "untitled"
}
// add "random" part to keep slug unique
return fmt.Sprintf("%s-%d", slug.Make(title), time.Now().Nanosecond()/1000)
}
Thank you for this report and sorry it took so long, burnout is not nice...
So first: https://github.com/rainycape/unidecode that slug
package is using underneath have test showing that it is stripping emojis:
I forked it to https://github.com/gosimple/unidecode It's true that it's missing a lot of characters that could be properly converted to ASCII and everyone are welcome to provide more updates (I'll also merge at some point additions from forks, like https://github.com/cuilun/unidecode).
Second: from the beginning I designed slug
to be on the safe site and, for example, I also used it for generating file names so chars like /
should not be in the output.
Third: I will not change default behavior (I don't want to break anyone) but it's possible to add some flag like AllPrintableASCII
by default set to false
(to allow all chars from https://en.wikipedia.org/wiki/ASCII#Printable_characters - but space will be still replaced with -
).
Or maybe just export:
https://github.com/gosimple/slug/blob/a0807d1a7cb54c4ab98007090109c9574a68a91c/slug.go#L35
so everyone could configure it themselves? I'm open to your ideas.
This seems like a silly use case at first, but when creating a slug from a string containing emojis or pictograms, there is no representation of those characters. For example:
all yield empty strings.
I'm not sure how such a character would best be represented in a slug, but simply removing them could be problematic in some cases. Is this intentional?