simov / slugify

Slugifies a string
MIT License
1.55k stars 131 forks source link

Forward slash is valid url character but removed #98

Closed kmturley closed 3 years ago

kmturley commented 3 years ago

Slugify is designed to support valid url characters. However if I pass in a forward slash:

slugify(`Something/Another thing`, { lower: true })

The result is:

somethinganother-thing

Where I would expect:

something/another-thing

This can of course be mitigated by changing the remove RegEx adding \/:

console.log(slugify(`Something/Another thing`, { lower: true, remove: /[^\w\s$*_+~.()'"!\-:@\/]+/g }));

But this is not a very elegant solution. Ideally there would be an ignore/exclude character feature?

console.log(slugify(`Something/Another thing`, { lower: true, exclude: '/' }));
JuanFML commented 3 years ago

Hi! I am new here, I was wondering if I could try to fix this issue?

simov commented 3 years ago

Generally speaking I don't think adding additional exclude option is the way to go. It simply doesn't make sense because you have a remove regex by default, that excludes characters from being available in the result slug, and then you have an exclude option that excludes from the excluded characters. You see what I'm saying? So there might be an elegant way to solve this but I'm not sure what it is.

Trott commented 3 years ago

This is probably an over-engineered and inelegant solution, but I can't tell so here it is anyway:

It would be a breaking change, but perhaps the strict option can be deprecated in favor of a mode setting that affects the defaults of things like the remove regex. slugify can ship with a strict mode and loose (or sloppy or whatever you want to call it) mode, but people could add their own and perhaps even ship them as separate packages, effectively being a plugin:

const slugify = require('slugify')
const slashesMode = require('slugify-plugin-slashes-mode')

slugify.addMode(slashesMode) 
slugify.mode = 'slashes'

// Or maybe `.use()` or `.useMode()` to do both of the above two lines at once?
simov commented 3 years ago

Thanks @Trott, so now we are building a wrapper for regular expressions, although I really do see your point. I like the idea of having the strict mode by default in a next major. I think the main confusion stems from the fact that you have to deal with the existing remove regex by default. I imagine people feeling much more comfortable about crafting their own remove regex from scratch but I might be wrong.

kmturley commented 3 years ago

I was thinking of two potential solutions:

  1. Input of multiple characters (include/exclude lists), converted to a single RegEx slugify(Something/Another thing, { lower: true, exclude: ['/'] })

  2. Input of multiple RegExes, combined to a single RegEx slugify(Something/Another thing, { lower: true, exclude: ['\/'] })

There are some attempts to combine or simplify RegExs: https://stackoverflow.com/questions/869809/combine-regexp https://github.com/spadgos/regex-combiner https://github.com/gogeorge/EasyRegex https://github.com/aaditmshah/regex

Not sure if that is a reliable solution

Trott commented 3 years ago

Thanks @Trott, so now we are building a wrapper for regular expressions, although I really do see your point.

It would also allow one to alter other settings so the possibilities become pretty substantial. A mode plugin could alter the charmap, so there could be an easy way to add Pinyin for users who need that, but it can be omitted by default for users who don't need thousands of characters increasing the size of their library.

simov commented 3 years ago

@Trott that makes sense, but it will also require complete overhaul of how the module works.

@kmturley thanks for your suggestions.

Trott commented 3 years ago

This is perhaps more maintainable and understandable than passing in a remove regular expression.

function slugifyWithSlashes(myString) {
 return myString.split('/').map((val) => slugify(val)).join('/')
}
Trott commented 3 years ago

Repeating what I wrote in https://github.com/simov/slugify/pull/131#issuecomment-907646539 here:

I always considered a slug to be a URL element, not a path, so keeping slashes is very much unexpected to me.

To me, the canonical use case is "I have a title of an article and I want to convert that article title to a component in a URL."

So I might have a blog at www.example.com/my-awesome-blog. And if I post something today, it might end up in www.example.com/my-awesome-blog/2021/08/28/some-slug-value-here.

So if I have an article titled "Better A/B Testing", I want a slug like better-ab-testing or Better-AB-Testing or better-a-b-testing but most certainly not better-a/b-testing.

The use case where keeping slashes is expected seems unusual to me. Can you explain how this came up for you?

kmturley commented 3 years ago

The use case is when constructing a url with multiple slug parts:

county: Los Angeles city: Culver City

With the current functionality you could do:

`${slugify(location.county)}/${slugify(location.city)}`

But with the suggested feature it would be:

slugify(`${location.county)}/${location.city}`)

This is an example for just two url parts, but there could be n number

Trott commented 3 years ago

This is an example for just two url parts, but there could be n number

Interesting. For what it's worth, I'd be inclined to put all the URL parts into an array and use map() and join():

urlParts = ['Los Angeles',  'Culver City', 'California', 'United States of America', '99122', 'Earth']

urlParts.map(slugify).join('/')

Being able to run slugify() once may be more performant, but I don't think there's too situations out there where the bottleneck is slugify().

kmturley commented 3 years ago

Sure, the performance benefit is negligible. But one RegEx is more performant than a loop. For me it's the convenience and to make code more DRY.

Forward slashes are very closely related to slugs so either way, some folks will need the workaround! I will understand if you don't feel it's a feature slugify should support.

Trott commented 3 years ago

By the way, the use case suggested here would result in a bug in user code if a county or city (or any other string supplied) contained a slash in its name, whereas using an array with map() and join() or any of the other alternatives avoids that footgun entirely. I understand the motivation, but the more I think about this, the more I think this is a feature we probably don't want.

kmturley commented 3 years ago

Fair enough, closing!