simov / slugify

Slugifies a string
MIT License
1.5k stars 129 forks source link

Chinese support #178

Open Kostanos opened 1 year ago

Kostanos commented 1 year ago

Hey, first of all, thx, for the library!

I tried multiple languages, and they works very good.

Is there any plan to add support for Chinese?

thaisfaria commented 1 year ago

I am also interested in Chinese characters support

Trott commented 1 year ago

Chinese character support would make the package very large. I think the best thing to do might be to create a second package that adds Chinese characters to the charmap. People can opt in with the second package.

If you don't really care about meaningful transliteration and just want something that outputs ASCII characters, the slug package will do that.

const slugify = require('slugify');
const slug = require('slug');

const goodMorning = '早上好/早安';

console.log(slugify(goodMorning)); // ''
console.log(slug(goodMorning));  // '5pep5lik5aw9laxqewuiq'

slug works very similarly to slugify. One difference, though, is that when it processes an input string that results in an empty output string, it has a fallback step where it base64 encodes the input string and uses that as the output string.

ausir0726 commented 3 months ago

I hope to find a way to keep the output of Chinese characters because currently, the Chinese characters will be removed. But actually, the Chinese SLUG is quite friendly in the browser. However, I can't find any method to retain the input of Chinese characters.

input: '早上好/早安' Expectation: '早上好-早安' Actually: ''

Trott commented 3 months ago

I hope to find a way to keep the output of Chinese characters

The design of slugify is to output ASCII characters. I don't think this is the right package for what you're trying to do. I don't think slugify.extend() works in this case.

If you use slug instead of slugify, you will still be fighting against the grain of the design, but you can add Chinese characters to slug.multicharmap to get the result you are looking for.

> slug.multicharmap['早'] = '早'
'早'
> slug('早')
'早'
> 

If there are thousands of these character, as I suspect there are, you can put them all in a separate module that extends slug.multicharmap and even publish it to npm for other people to use.

There might be a similar way to accomplish this with slugify but I don't see anything in the exposed API for it. Correction welcome!

Trott commented 3 months ago

Following on my last comment: A better approach might be to determine the character code range of safe Chinese characters and find a way to bypass processing those in the first place, and then re-adding them to the slug, or adding only the characters that the user includes in the string into multicharmap at run time. Or not using slug or slugify at all and using a completely different method to slugify things. (The temptation would be to make a comprehensive list of unsafe characters and remove them, and leave everything else. That will probably be error-prone, as what amounts to an unsafe character is context-dependent. It's why slug and slugify go the other way and restrict output to a very narrow set of characters, at least by default.)

ausir0726 commented 3 months ago

@Trott Thank you for your response. I think you are right. In the end, I used https://www.npmjs.com/package/github-slugger as my slug solution. It can keep my multilingual text in the slug. It's also SEO-friendly (but not so friendly when copied and URL-encoded).

Anyway, if I want to keep non-ascii in the slug, I will use github-slugger. If I want to use all ascii as the slug, I will use slugify.

Thank you.