Closed emoutisheva closed 1 year ago
I think another way to solve this is by extending slugify with your own translation of the ь
character that is something other than an empty string ''
.
Or maybe add it as a new locale in this module. As for the reason why ь
was set to empty string initially I'm not sure. I think I set it initially in reference to the Bulgarian language, but now thinking about it that doesn't make much sense either because in that case it has to be set to i
- ьо
-> io
maybe.
Yes, if it's set to something else likeio
would work as it will be replaced. So there are different ways of solving this issue, on our end we've chosen the above, although we could have added a new location for 'ru' or simply replaced the "" in the charmap.json. Anyway just wanted to make you aware of this as it could cause url encoding errors and you can best decide how to/if to change it.
I'm a little confused. I'm unable to repro the issue here, unless I'm misunderstanding. Maybe you're using an option that isn't the default somewhere? When I use the latest version of slugify, it removes the character.
const slug = require('slug')
const slugify = require('slugify')
console.log(`SLUG "${slug('ь')}"`)
console.log(`SLUGIFY "${slugify('ь')}"`)
console.log(`SLUG "${slug('aь')}"`)
console.log(`SLUGIFY "${slugify('aь')}"`)
Output:
SLUG "0yw"
SLUGIFY ""
SLUG "a"
SLUGIFY "a"
slugify removes the character in both cases.
slug does too but takes a different approach if the resulting slug is an empty string in its entirety. It generates a base64 encoding of the string as a slug. This way, you at least get something useable (if not ideal) if you use many strings consisting entirely of characters that are discarded.
Also note that using ??
is only available in Node.js 14 and newer. Since slugify
purports to support Node.js version back to 8, this would be a breaking change. (However, Node.js earlier than 14 is EOL, so I would support/encourage such a breaking change.)
Actually you are correct, @Trott, I was not able to reproduce that either. I thought I know about the issue because I remember someone reporting it in the past. @emoutisheva can you provide a sample that reproduces the issue? I tried with anglijskaya-premьer-liga
and the ь
character was removed.
To clarify it a bit, we are passing translation "Английская Премьер-лига" . We are using: return slugify(name, { locale:'ru', lower: true, remove: /#|@|\?/ });
. which results in "anglijskaya-premьer-liga". But if I test it your way, without the remove regex, it will remove the character and result it "anglijskaya-premer-liga". So digging a bit deeper it looks like the default remove regex you use in slugify.js line 42: .replace(options.remove || /[^\w\s$*_+~.()'"!\-:@]+/g, '')
removes that characterь
.
Yes, that is because ь
falls outside of A-Za-z
range of the \w
class. Since you are already using a locale for the Russian language then why not define the translation for that character there?
Decided to add the locale. Thank you for your suggestions and time.
Hi! 👋
Firstly, thanks for your work on this project! 🙂
Today I used patch-package to patch
slugify@1.6.5
for the project I'm working on.We use it to convert translations from other languages to english for url slugs for example : /anglijskaya-premьer-liga/ The character
ь
is mapped to an empty string in your charmap.json and it does not get converted or removed:var appendChar = locale[ch] || charMap[ch] || ch;
resulted invar appendChar = undefined || "" || ь.
This however was causing some encoding issues resulting in a redirect loop and not being able to open the urls. Changing the || to ?? will remove the character.Here is the diff that solved my problem:
This issue body was partially generated by patch-package.