Closed iliazeus closed 1 year ago
This introduces a new way to remove characters, but there is already the remove
option.
slugify('cat', {remove: /a/ig}) // 'ct'
Not necessarily saying we shouldn't do this, but we might want to pause before introducing multiple ways to do the same thing. If this is a more ergonomic way, consider deprecating remove
? If it's a less ergonomic way, don't implement it?
@Trott they're not the same semantically.
The "mapping into empty string" is related to the transliteration rules. It is a way to specify in the locale and/or charmap that we don't want a specific letter to be translated into anything. This is already in the charmap for ь
and Ь
, for example; this should also be in the Russian locale for ъ
and Ъ
, and there are probably more cases.
The remove
argument is a way for the user to specify that they don't want specific characters in the final result. It does not depend on locale and charmap, and is usually used for things like punctuation. The user will probably not be aware of all the transliteration rules, and they won't put things like ь
in there when specifying a custom regex.
@Trott they're not the same semantically.
Thanks for the explanation. That makes sense and I understand now.
This is only useful for the twenty six letters of the Latin alphabet, right? For everything else, it already works as expected? Or am I wrong about that? (EDIT: Maybe it's useful for certain common symbols too?)
@Trott my own usecase was actually similar to the one described in #172: the Cyrillic characters ь
and Ь
are (correctly) mapped into empty strings in charmap.json
. It should also be done for ъ
and Ъ
when the Russian locale is added (right now they use Bulgarian transliterations in charmap.json
). Thing is, those are proper letters, but since (in Russian, at least) they don't have their own proper sound, they are usually just omitted when transliterating (but in Bulgarian, Ъ
isn't!). This feature is useful for these kinds of letters. I don't know enough non-latin and non-cyrillic scripts to offer other examples, though :)
The test uses latin characters for the sake of simplicity.
@Trott my own usecase was actually similar to the one described in #172: the Cyrillic characters
ь
andЬ
are (correctly) mapped into empty strings incharmap.json
. It should also be done forъ
andЪ
when the Russian locale is added (right now they use Bulgarian transliterations incharmap.json
). Thing is, those are proper letters, but since (in Russian, at least) they don't have their own proper sound, they are usually just omitted when transliterating (but in Bulgarian,Ъ
isn't!). This feature is useful for these kinds of letters. I don't know enough non-latin and non-cyrillic scripts to offer other examples, though :)The test uses latin characters for the sake of simplicity.
I'm sorry if I'm being foolish here and missing something obvious, but doesn't that already work without the change in this pull request?
const slugify = require('slugify');
const str = 'ъaъ';
console.log(slugify(str)); // 'uau'
slugify.extend({ 'ъ': ''})
console.log(slugify(str)) // 'a'
Maybe all that needs to happen is to add a Russian locale to config/locales.json
?
@Trott sorry for being so unclear :)
The tricky part is: in your case, ъ
is removed not by the extended charmap, but by the default remove
regexp, which matches [^\w]
(among others). Here is a quick test with a dummy remove
:
> var slugify = require('slugify')
undefined
> slugify('ъяъ', { remove: /[]/ })
'uyau'
> slugify.extend({ 'ъ': '' })
undefined
> slugify('ъяъ', { remove: /[]/ })
'ъyaъ'
The order of steps for the current behavior is:
ъ
is mapped by the charmap to '''' || 'ъ' === 'ъ'
remove
If someone that is not aware of language-specific rules specifies a custom remove
, then mapping to empty string will stop working - it only did because of the default remove
. This PR fixes that.
@Trott we could instead just document that the remove
regexp must contain the [^\w]
class, but this would still be counter-intuitive, since it looks like it's mostly meant for punctuation and other non-letter, non-locale-dependent characters. The documented example does actually not match [^\w]
:)
I've updated the test to be a more relevant example of bugged behavior.
I think I see now. FWIW, the slug
module already works this way.
const slug = require("slug");
const slugify = require("slugify");
const str = "ъяъ";
console.log(slug(str)); // 'uyau'
console.log(slugify(str)); // 'uyau'
slug.extend({ ъ: "" });
slugify.extend({ ъ: "" });
console.log(slug(str)); // 'ya'
console.log(slugify(str)); // 'ya'
console.log(slug("ъяъ", { remove: /[]/g })); // 'ya'
console.log(slugify("ъяъ", { remove: /[]/g })); // 'ъyaъ'
Published in v1.6.6
@Trott I also added this https://github.com/simov/slugify/commit/3f0b3f594a06d7b51e774e25d5dc288f7247e8bb
Resolves https://github.com/simov/slugify/issues/172
Resolves https://github.com/simov/slugify/issues/95