simov / slugify

Slugifies a string
MIT License
1.51k stars 129 forks source link

Allow to remove dash at the end of the slug #112

Closed popod closed 3 years ago

popod commented 3 years ago

With

const slug = slugify('test !!!', {
  strict: true
})

// slug => 'test-'

Is it possible to allow to remove this latest dash ?

Trott commented 3 years ago

slug(' test !!!', { strict: true }).replace(/^-|-$/g, '') will strip leading and trailing dashes, but I imagine that's not the answer you were looking for.

Removing any leading or trailing dash is the default behavior in the slug module, so one option would be to switch to that module. (Full disclosure: I maintain that module.)

If there's interest in that being the default behavior for slugify, I'm happy to send a pull request. Would probably have to be a breaking change though, just to be super cautious about not breaking anyone's code unintentionally.

simov commented 3 years ago

I think this should be the default behavior and I'd imagine people having this issue already trimming the slug themselves. I'd like to have that behavior by default here, but I don't see it as a breaking change necessarily.

vijayhardaha commented 3 years ago

Faced this issue today. Any update on when a new release coming with this fix?

Trott commented 3 years ago

Faced this issue today. Any update on when a new release coming with this fix?

I just published slugify@1.5.2 with trimming of leading and trailing replacement characters as the default behavior.

popod commented 3 years ago

@Trott Hi and thank you for your replies and work !

What is the difference between slugify and your slug module (features, compatibility, performances, ...) ?

Trott commented 3 years ago

@Trott Hi and thank you for your replies and work !

What is the difference between slugify and your slug module (features, compatibility, performances, ...) ?

@popod They are very similar, having nearly identical APIs and having both evolved from the same code base. Both work in the browser and in Node.js with no dependencies.

I haven't tested/benchmarked, but I suspect slugify would have a smaller code size (more important for the browser than for the server).

If legacy browser support is important to you, note that slug works in the IE11 browser and I think slugify does not.

These two things are related. The code size for slug is a bit larger because it has polyfills for IE11.

The defaults for slug are to return lowercase slugs that consist only of the 26 letters in the English alphabet [a-z]. So it's more aggressive about stripping out punctuation and special characters.

slugify has replacements for unusual characters like . By default, slug strips that sort of thing out and replaces only alphabetic characters. (As with slugify, you can add custom characters yourself. They just don't come out of the box with slug.)

slugify('I ♥ Unicode!'); // 'I-love-Unicode!'
slug('I ♥ Unicode!'); // 'i-unicode'
slug('I ♥ Unicode!', { charmap: { '♥': 'heart' }});  // 'i-heart-unicode'

One edge case difference is that if slug can't replace any of the characters in the string, it returns a slug that is the Base 64 encoding of the string. slugify instead returns an empty string in that case.

slugify('☢ ☢ ☢'); // ''
slug('☢ ☢ ☢'); // '4piiiokyoidimki'

This feature is useful if you are writing something that might need to generate slugs in words for alphabets you may not predict in advance. For example, let's take a Kanji character, :

slugify('日'); // ''
slug('日'); // '5pel'

It's possible that there are a few alphabetic characters not supported in one or the other of the modules, but the charmaps and locales should be largely equivalent (with the exception again of non-alphabetic characters like ♥ which slug intentionally omits). They also have mostly or entirely the same locale options.

As for performance, I don't actually know. I'm going to put together a benchmark right now and see!

Trott commented 3 years ago

I wrote a quick benchmark, and it indicates that slugify is significantly faster than slug. I'm going to dig in and try to find out why. Maybe my benchmark sucks. Maybe slug has some areas where it can be optimized. Maybe slug sacrifices some perf enhancements in the name of IE 11 compatibility. Maybe slugify gains performance by not worrying about edge cases that perhaps never actually come up anyway. And a million other possibilites.)

Most likely explanation is slug can use some perf work.

Second most likely explanation is my benchmark sucks.

Here's my benchmark. Probably a good idea to write one that uses real world data from your own app and see if you get similar results or not.

const slug = require('slug')
const slugify = require('slugify')

let slugLib
let label
if (process.argv[2] === 'slug') {
  slugLib = slug
  label = 'slug'
} else {
  slugLib = slugify
  label = 'slugify'
}

const BENCHMARK_RUNS = 2048
const MAX_WORD_LENGTH = 16
const MAX_WORD_COUNT = 4

function random (max) {
  return Math.floor(Math.random() * max + 1)
}

const chars = Object.keys(slug.charmap)

function getString () {
  const wordCount = random(MAX_WORD_COUNT)
  const wordLengths = Array.from({ length: wordCount }, () => random(MAX_WORD_LENGTH))
  const words = wordLengths.map((wordLength) => Array.from({ length: wordLength }, () => chars[random(chars.length) - 1]).join(''))
  return words.join(' ')
}

const strings = []
for (let i = 0; i < BENCHMARK_RUNS; i++) {
  strings.push(getString())
}

console.time(label)
for (let i = 0; i < strings.length; i++) {
  slugLib(strings[i])
}
console.timeEnd(label)
Trott commented 3 years ago

And, for the sake of science, here are some sample runs on my laptop:

$ node benchmark.js slug
slug: 65.195ms
$ node benchmark.js slug
slug: 66.769ms
$ node benchmark.js slug
slug: 67.963ms
$ node benchmark.js slugify
slugify: 25.289ms
$ node benchmark.js slugify
slugify: 24.638ms
$ node benchmark.js slugify
slugify: 25.17ms
$ 
Trott commented 3 years ago

Those benchmark results are with previous versions of slug (4.04) and slugify (1.5.0). Here are the results with current slug (5.0.0) and current slugify (1.5.2). Nothing has really changed:

$ node benchmark.js slug   
slug: 68.873ms
$ node benchmark.js slug
slug: 64.661ms
$ node benchmark.js slug
slug: 68.128ms
$ node benchmark.js slugify
slugify: 22.94ms
$ node benchmark.js slugify
slugify: 24.832ms
$ node benchmark.js slugify
slugify: 24.323ms
$ 
Trott commented 3 years ago

Another difference that I am only now noticing: slug supports several alphabets (including a few RTL ones) that slugify does not (Arabic, Hebrew, Farsi, Hindi, possibly others). I'll probably send pull requests to add those alphabets to slugify at some point, but first to find the source of the performance difference. (Hope it's not having those additional alphabets!)

Trott commented 3 years ago

The performance difference comes down to support for characters requiring more than one code point to represent. This is necessary for Hindi (and other languages like Sanskrit that use the Devanagari alphabet) and Hebrew. It's also necessary for emoji.

> slug.extend({'👍': 'thumbs up!' })
undefined
> slug('Hi! 👍')
'hi-thumbs-up!'
> slugify.extend({'👍': 'thumbs up!' })
undefined
> slugify('Hi! 👍')
'Hi!'
> 

My thinking right now is to introduce either an option or some kind of detection in slug() so it can bypass the slow multi-code-point processing for users that value performance over completeness. If that works well and if there's interest in Devanagari/Hebrew/emoji support in slugify, I can look at porting that over.

simov commented 3 years ago

I think the best approach to have this is by introducing a new option, at least here in slugify.

vijayhardaha commented 3 years ago

leading and trailing replacement only works if you're passing the replacement at start or end. but when you get the final result value and get a replacement at the start/end, then it doesn't remove those leading and trailing replacements.

Take this string as an example: " Showcase Your Figma Designs on WordPress P2 "

Expected: Showcase-Your-Figma-Designs-on-WordPress-P2 Output: -Showcase-Your-Figma-Designs-on-WordPress-P2-

image

In my suggest we must use another regex replace to remove leading and trailing replacements from final results.

/^-+|-+$/ /^' + replacement + '+|' + replacement + '+$/ Something like this. What do you think? @Trott image

Trott commented 3 years ago

leading and trailing replacement only works if you're passing the replacement at start or end. but when you get the final result value and get a replacement at the start/end, then it doesn't remove those leading and trailing replacements.

Take this string as an example: " Showcase Your Figma Designs on WordPress P2 "

Expected: Showcase-Your-Figma-Designs-on-WordPress-P2 Output: -Showcase-Your-Figma-Designs-on-WordPress-P2-

@vijayhardaha What version of slugify are you using? The relevant fix was introduced in 1.5.2, and I'm not seeing the behavior you are describing.

$ npm ls slugify
temp@1.0.0 /Users/trott/temp
└── slugify@1.5.2

$ node
Welcome to Node.js v16.1.0.
Type ".help" for more information.
> require('slugify')(" Showcase Your Figma Designs on WordPress P2 ")
'Showcase-Your-Figma-Designs-on-WordPress-P2'
> 
popod commented 3 years ago

@Trott when the strict option is enabled, trailing dash are not correctly removed.

slugify('Hi !', { strict: true })
// -> "Hi-"
vijayhardaha commented 3 years ago

@Trott hey I am using the latest version. To test my case you need to start and end the string with special characters. so replace this require('slugify')(" Showcase Your Figma Designs on WordPress P2 ") with this require('slugify')('" Showcase Your Figma Designs on WordPress P2 "') use double quotes within the text and enable strict option;

This is happening when we use special char at the start/end with whitespace

"-Showcase-Your-Figma-Designs-on-WordPress-P2-"

-Showcase-Your-Figma-Designs-on-WordPress-P2-

Trott commented 3 years ago

@vijayhardaha slugify('" Showcase Your Figma Designs on WordPress P2 "') results in '"-Showcase-Your-Figma-Designs-on-WordPress-P2-"'. That seems correct to me. What value are you expecting instead? What command/settings are you using to get those results?

Trott commented 3 years ago

@vijayhardaha Ah! I see! You say you are running with the strict option. Now I understand. Fix should be coming shortly.

Trott commented 3 years ago

Proposed fix is in https://github.com/simov/slugify/pull/116 if anyone wants to test it and confirm it fixes the issue for them.

@vijayhardaha Your suggestion indeed would fix the bug, but it also involves a dynamically generated regular expression using replacement. I know two of those already exist in the code base, but it had been my intention to get rid of them because they do not work as expected in edge case situations like replacement having a value of '.'. In the fix in the pull request linked above, the existing dynamically created regular expressions are removed.

vijayhardaha commented 3 years ago

Hey @Trott I ran several tests, Proposed fix in #116 solves the problem.

Trott commented 3 years ago

slugify 1.5.3 published with the fix. Thanks for reporting the issue with strict mode!

davidlukerice commented 3 years ago

Can removing trailing replacement characters be optional? This was a breaking change for us as we use slugify for automatically validating and cleaning up urls being typed into a text field. Previously, a user could type in a-url-with-replacement-characters one character at a time in a text field and get back a-url-with-replacement-characters. However, since the "-" characters will automatically get cleared when typing now (aka, when it's trailing like a-url-), it comes out to aurlwithspaces.

Trott commented 3 years ago

Can removing trailing replacement characters be optional? This was a breaking change for us as we use slugify for automatically validating and cleaning up urls being typed into a text field. Previously, a user could type in a-url-with-replacement-characters one character at a time in a text field and get back a-url-with-replacement-characters. However, since the "-" characters will automatically get cleared when typing now (aka, when it's trailing like a-url-), it comes out to aurlwithspaces.

As a temporary (or maybe even permanent) workaround, you could add a letter to the start and end of the string and then strip it off the result:

slugify('a' + str + 'a').replace(/^a|a$/g, '')

Not the most elegant solution, but it will work.

Trott commented 3 years ago

@davidlukerice If all goes well, I'm about to add a "trim" option to the slug module so you could use slugify(str, { trim: false }). See https://github.com/Trott/slug/pull/230. If that looks good to you, I'll do the same thing in slugify (barring concerns from @simov).

Trott commented 3 years ago

Configurable "trim" option (but still on by default) in slug@5.1.0. Will open a PR for slugify later (unless someone beats me to it, which is fine).

sebamarynissen commented 2 years ago

Just wanted to mention that it would've been nice to have this as a semver major. I recently discovered that it broke my code without me knowing. My situation is that I slugify my usernames so that user name is considered the same name as user-name for example. However, some users have names like -username- (even though I don't allow it anymore now) and now they can no longer login because the slugged usernames in the database don't match.

I can fix it for some of them by simply changing the slug in my database manually, but this doesn't work when I have both username and -username- as separate users. The problem is limited to about 20 users on a total of 30k, so nothing too major, but I just wanted to share my experience.

Trott commented 2 years ago

Just wanted to mention that it would've been nice to have this as a semver major. I recently discovered that it broke my code without me knowing. My situation is that I slugify my usernames so that user name is considered the same name as user-name for example. However, some users have names like -username- (even though I don't allow it anymore now) and now they can no longer login because the slugged usernames in the database don't match.

I can fix it for some of them by simply changing the slug in my database manually, but this doesn't work when I have both username and -username- as separate users. The problem is limited to about 20 users on a total of 30k, so nothing too major, but I just wanted to share my experience.

It doesn't take away from the breaking-change aspect, but if it helps, you can also use slugify(str, {trim: false}) to get the old behavior and not change anything in your database.

sebamarynissen commented 2 years ago

Thanks, that's what I used to solve the problem: I've put a guard in front which checks if a username starts or ends with a -. If so I use the legacy slugify(str, {trim:false}) and check if the outcome matches the list of "affected users". If not, I run them through the slugify(str) method as normal. Kind of hacky, but fixed it. Hopefully it can be helpful for others who had a similar issue.