sindresorhus / humanize-url

Humanize a URL: https://sindresorhus.com → sindresorhus.com
MIT License
250 stars 6 forks source link

Also do truncation #1

Open forivall opened 9 years ago

forivall commented 9 years ago

t'would be nice if this also did truncation -- pass another argument with something like {truncate: 40}, and the url would be nicely truncated, with '…' === '\u2026' replacing the truncated text.

Ex.

humanize('http://example.com/a/cool/page/that-is-really-deeply/nested/', {truncate: 20}') -> 'example.com/…/nested'

humanize('http://example.com/a/cool/page/that-is-really-deeply/nested/', {truncate: 24}') -> 'example.com/a/…/nested'

humanize('http://example.com/a/cool/page/that-is-really-deeply/nested/', {truncate: 30}') -> 'example.com/a/cool/…/nested'

Or this could also be a separate module. module-requests ahoy?

sindresorhus commented 9 years ago

I did a module for truncating the length of an URL: https://github.com/sindresorhus/truncate-url

Will integrate it as an option here, but first I want to make sure it covers all cases.

Would you mind submitting some more tests on how the behaviour should be handled? Or just some examples here's if that's easier. Like what should happen if the want length is shorter than the URL with all paths truncated? Should it then start truncating the domain name? Then what? Would be nice to have the steps outlined.

forivall commented 9 years ago

Hmm. My intent is mainly on truncation for humanize-ing, so if the domain is longer than the truncation length, I would just leave the domain as is:

truncateUrl('http://example.com/some/path', 10)-> 'example.com/…

truncateUrl('http://example.com/', 10)-> 'http://example.com/'

Particularly, for the sake of security and the possibility phishing, I don't think the domain should ever be truncated. However, parts of the domain outside of public suffixes (https://publicsuffix.org/) can be safely truncated. For example:

truncateUrl('http://subdomain.example.com', 10) -> `'http://….example.com'

truncateUrl('http://subdomain.example.co.uk', 10) -> `'http://….example.co.uk'

I would consider, for the sake of simplicity, that truncating the actual domain portion of the url should be optional, since querying the PSL (public suffix list) is overkill for most use cases, especially client-side, which is where I'm using this. Removal of www. should be enough, and that's already being done :hand::five:.

I'll submit some more tests to truncate-url when I have some extra free time.

Here's the steps I'd take for truncating specific parts of the url, including humanization. (terminology from https://nodejs.org/api/url.html)

  1. remove protocol, protocol slashes, auth, "www"
  2. remove hash (leave in if short enough)
  3. remove search (leave in if short enough)
  4. truncate pathname; see below logic
  5. truncate non-public subdomains (optional; requires PSL)

For logic regarding truncating the path, parts closest to the domain should be truncated first. An alternate idea is to start by truncating the longest path part, and then starting from that point, iteratively truncate the longest adjacent path part. Hopefully, that makes sense, and hopefully some test cases will make that clear.

Finally, path delimiters other than '/' could also be considered. Probably also overkill though.

puzrin commented 9 years ago

Will integrate it as an option here, but first I want to make sure it covers all cases.

https://github.com/nodeca/nodeca.core/blob/master/lib/parser/beautify_url.js#L25-L103

See logic we extracted from chromium. Feel free to use as you wish without any references. I'll be happy to switch to your package.

sindresorhus commented 9 years ago

Thanks @forivall and @puzrin. I'll look into improving it soon :)

CanRau commented 4 years ago

Particularly, for the sake of security and the possibility phishing

I like that idea of humanization while (especially) keeping security in mind 🎉

Any news on this? Titus mentioned this in https://github.com/remarkjs/remark/pull/462#issuecomment-571052040 Would love to use it in remark-truncate-links which is currently using the truncators of Autolinker.js as I like the proposed algorithm better I believe 🙌