Closed JimmyGalar closed 9 months ago
You can fairly easily filter out emails from the result. For example, if a result contains @
but does not contain ://
, it's an email.
If we just returned URLs, the quality of the results would be worse. The input john.smith@test.com
would give you test.com
, for example.
I did that as a workaround, was hoping there was a pattern or systemic way to exclude versus removing from the string to be interpreted.
I'm happy to discuss API ideas if you have any, but remember that it's unlikely we can "disable" matching emails in the relaxed regexp.
What would you think of adding a new top-level API like:
func IsEmail(string) bool
Then, you could iterate over your xurls.Relaxed
results and use xurls.IsEmail
to filter emails as needed. In the future we could write other similar helper funcs, like HasScheme
.
Friendly ping @JimmyGalar :)
I thought about this briefly and pushed https://github.com/mvdan/xurls/commit/09d66fb475fb3e22da5d04135c6c168f1038d40b to master, what do you all think?
I will assume that the fix in master is enough. Feel free to leave a comment or file a new issue if you disagree.
I am using the xurls code to pull out possible urls from a message body string. The urls can be in either strict or relaxed format so I need to use the relaxed method of xurls to find the possible urls in the string. The issue is that email addresses can also be in the string and the relaxed method of xurls is pulling those out too.
For example my string might be: "Hello from http://www.google.com, please check the www.test.com webpage for further information. If you have any questions please email John.Smith@test.com or Testing@test.com"
What I would like xurls to do is just pull the http://www.google.com or www.test.com.
Instead is pulls the 2 urls, and John.Sm, test.com, test.com. Is there anything that can be done so that only urls are pulled?