gregjacobs / Autolinker.js

Utility to Automatically Link URLs, Email Addresses, Phone Numbers, Twitter handles, and Hashtags in a given block of text/HTML
MIT License
1.48k stars 239 forks source link

Please improve telephone number recognition #95

Closed ghost closed 7 years ago

ghost commented 9 years ago

International phone numbers starting with the + symbol seem not recognized. Note that all the possible international prefixes are known (a list can be found at http://en.wikipedia.org/wiki/List_of_country_calling_codes ) so the regexp could be very strict. Something like: \+(code1|code2|...)[ \-\(\)0-9]{3,} (or possible requiring an even longer sequence of digits to protect from recognizing numbers with a leading explicit positive sign).

Right now the regex recognizes only numbers roughtly in US format (XXX) XXX-XXXX .

ghost commented 9 years ago

Oops, sorry, it looks like the regexp should already handle international number. However, we format them as: +1 202-371-2121 and it looks like they are not recognized.

Also, please note that for international numbers starting with a +, the tel: url should include the + character; eg:

+1 212 555 1212 -> tel:+12125551212 and NOT tel:12125551212 as a phone call software would not be able to understand wether to add the international calling prefix (001 for the US, 00 for most of Europe) when placing the call.

gregjacobs commented 9 years ago

Hey, thanks for the info. Great to know. Will update for this.

ghost commented 9 years ago

English numbers, e.g. 020 1111 1111 are not being recognised also.

simison commented 9 years ago

Here's an example:

http://rubular.com/r/A296jC6gdp

(Regex from https://github.com/gregjacobs/Autolinker.js/blob/master/src/matchParser/MatchParser.js#L114 )

Seems like it doesn't stop on newlines either. (fixed at #129)

simison commented 9 years ago

@gdelprete @mfc-julius @gregjacobs would this work better:

http://rubular.com/r/KzyezB1d0I

/((\+|00)[1-9]{1}[0-9\040]{7,14})|((?:(\+)?\d{1,3}[-\040.])?\(?\d{3}\)?[-\040.]?\d{3}[-\040.]\d{4})/

It basically has its own part for European numbers (first) and second part for the rest.

Thoughts?

gregjacobs commented 9 years ago

Nicely done. Let me give this a try and I'll let you know!

simison commented 9 years ago

FYI; I wanted to do more checking on length of the European numbers (that {7,14} bit) — but didn't yet find time to dig more.

simison commented 8 years ago

FYI, this looks interesting: https://github.com/chriso/validator.js/blob/master/validator.js#L83-L100

simison commented 8 years ago

Woah. So Google's phone number i18n recognition is massive

garychapman commented 8 years ago

@gregjacobs Would it be possible to make the various regex patterns externally configurable? We'd love it if Autolinker could recognise various Australian phone numbers, for example. Thanks!

christiandawson commented 7 years ago

196 is a PR that attempts to solve this issue, please review

olafleur commented 7 years ago

Closing. Will follow the idea of having an externally configurable regex in #202 .