robinst / linkify

Rust library to find links such as URLs and email addresses in plain text, handling surrounding punctuation correctly
https://robinst.github.io/linkify/
Apache License 2.0
206 stars 12 forks source link

Add basic linkify binary #36

Closed egrieco closed 2 years ago

egrieco commented 2 years ago

Linkify is excellent at pulling links from all kinds of text. I had been wanting to be able to use it on the command line for a while so I built a basic app around it that takes input on STDIN and outputs links on STDOUT.

This is very basic, and could/should be expanded in a number of ways. Happy to take feedback on what is considered missing if you don't feel this is ready for merging.

robinst commented 2 years ago

I like the idea, but maybe we should just link to the excellent lychee by @mre instead (in the README). It already has an option to just print links:

echo 'Try this: https://example.org and https://example.com' | lychee --dump -

What do you think?

egrieco commented 2 years ago

Slightly more cumbersome than echo 'Try this: https://example.org and https://example.com' | linkify but lychee is definitely more full featured.

I'm ok with discarding this pull request as long as we give people a hint about using lychee for this use case in the README.

robinst commented 2 years ago

Cool, done: https://github.com/robinst/linkify/commit/d00dd7468968a13e90496dbaae56b145a1868284

egrieco commented 1 year ago

I'd like to re-open this pull request as I've found a case where lychee does not work.

lychee --dump file_with_links.md | huniq -cS

Gives a count of all 1s since lychee de-duplicates links. This makes sense as there is no reason to check a link more than once during a run.

However, if we aren't intending to check links, but would like to check for the presence of duplicate links, then the --dump option to lychee makes it useless. As far as I can tell, there is no way to disable link de-duplication in lychee.

bat file_with_links.md | linkify | huniq -cS

This gives the intended behavior of a count of every link present in the file.

I'm happy to clean up, update and re-submit this pull request if you're willing to merge it.