insighty / openesignforms

Automatically exported from code.google.com/p/openesignforms
0 stars 0 forks source link

Phone number format for UK (and a general solution is best) #25

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
We have a US phone number field but not for Europe or UK. 

http://en.wikipedia.org/wiki/Telephone_numbers_in_the_United_Kingdom 
in the right sidebar it says: 

NSN length  10 mostly, 9 for some areas
Typical format  (01xxxx) xxxx[x]
(01xxx) xxxxx[x]
(01x1) xxx xxxx
(011x) xxx xxxx
(02x) xxxx xxxx
03xx xxx xxxx
07xxx xxxxxx
08xx xxx xxxx
09xx xxx xxxx 

For European countries the phone numbers can be different numbers of digits. 
For now I have a phone number field to remind myself, but will change it to a 
General field when the form goes into production if we don't have a UK phone 
number field. 

Original issue reported on code.google.com by lstro...@gmail.com on 11 Jun 2011 at 10:49

GoogleCodeExporter commented 8 years ago
Our US (NANP) does not have such detailed validations, but we can certainly do 
it like this as long as you are sure it's correct.

You mention the phone is 10 digits, but the forms you show often contain 11 
digits. I see none with 10 digits or 9 digits.

Might there be a simpler validation that would suffice. Even the US phone 
doesn't ensure it's a real phone, just that it's 10 digits and then we assume 
several common display formats that would apply to any 10 digits.  We need some 
specs to help us build this correctly, to ensure reasonable values/formats 
without worrying about any detailed validations.

For example, you say these are valid formats:
07xxx xxxxxx and 08xx xxx xxxx

Does this mean 07123 123456 would be acceptable, and 0812 123 1234 would be 
okay, but
0789 123 1234 would not and 08123 123456 would not? Those poor Europeans with 
their myriad formats! I noted Wikipedia shows that some countries are 7 digits, 
some 13 digits. I saw a UK number as 07700 954 321 rather than 07700 954321.

Since you need this, it seems like there should be a cleaner spec. For example, 
we used to allow 10 digits and assume the general format xxx.xxx.xxxx, but if 
they gave us more than 10 digits, we just "left it as is" and assume it was 
okay as it at least had enough digits, but that allowed for 425.555.1212 ext. 
1101 entries, etc.

Anyway, if this is for a particular customer who wants the phone number, can 
they provide you with a detailed spec on what they'd fine useful?  Are there 
any examples of such validators written already we can review?

Original comment by yoz...@gmail.com on 11 Jun 2011 at 11:29

GoogleCodeExporter commented 8 years ago
After some investigation here, it seems that there's just no way to ensure a 
phone number entered is valid.  Even in the US, where the main number is 10 
digits, if users put in an extension, it will fail.

What about this idea?  We create a GeneralPhoneNumber validator that does this:

1) Allows only digits, parentheses, hyphen, dot, space, and the letter combos 
"x" or "ext" or "extension" no more than once
2) You can use the min-length and max-length fields to control for length of 
digits only, such as a minimum number of digits and max digits, even if they 
entered other stuff.
3) There's no reformatting of what was entered.

In general, since a phone number cannot be validated in any meaningful way, 
this would allow for any sort of international numbers.  When I looked at other 
countries, they all seem to format differently.  I mean, if you enter 
1234567890 in the US Phone field it will be "valid" though not really.

The general consensus I've read seems to indicate that people tend to know 
their phone numbers and enter them correctly, and you don't want to block input 
on a form just because we've failed to capture one particular format, which are 
also changing all the time because of the expansion of cell phones and the like.

Of course, if we can get a very precise definition of phone numbers for the UK, 
and the UK become big users, I'm sure it will be added then. Also, in the 
future, we'll have custom field validators that will allow for more complex 
validation of a General field than perhaps regular expressions allow now.

That said, this regular expression might work for you:

(\(01\d\d\d\d\) \d\d\d\d\d?)|(\(01\d\d\d\) \d\d\d\d\d\d?)|(\(01\d1\) \d\d\d 
\d\d\d\d)... you should get the idea with each valid option in parens, 
separated by | (for OR logic), and \( and \) meaning the user actually entered 
the parens, \d being a digit, and \d? being an optional digit.

to match any of these:
(01xxxx) xxxx[x]
(01xxx) xxxxx[x]
(01x1) xxx xxxx
(011x) xxx xxxx
(02x) xxxx xxxx
03xx xxx xxxx
07xxx xxxxxx
08xx xxx xxxx
09xx xxx xxxx

Original comment by yoz...@gmail.com on 12 Jun 2011 at 3:55

GoogleCodeExporter commented 8 years ago

Original comment by yoz...@gmail.com on 12 Jun 2011 at 7:20

GoogleCodeExporter commented 8 years ago
There are several ways to ensure that a telephone number is potentially valid. 
Very detailed data is available for the UK and for many other countries.

However, before you start, there are four concepts that need to separated. 
These are "input format", "valid number range and valid number length for this 
range", "storage format" and "display format".

Many systems try to constrain the user to typing numbers in a particular 
format, and this is usually a very bad idea. The London number 020 3000 5555 
can be written as (020) 3000 5555 or as +44 3000 5555, but you'll equally see 
people writing 0203 000 5555, 02030 005 555, +44 (0) 20 3000 5555, +44(0)203 
000 5555, (44) 20 3000 5555, (44) 203 000 5555, (44) 2030 005 5555, (+44 203) 
000 5555, (+44) 203 000 5555, and many others, and the same again each with 
hyphens in various positions.

The user should be allowed to do that. Most users do not properly understand 
how telephone numbers work nor the significance of the spaces between country 
code, area code and local number.

Once entered, the number should then have the +44 and/or 0 prefix stripped, the 
punctuation and spaces removed, and only now should the remainder (the 'NSN') 
be checked for length and validity. This is very easy to do using RegEx 
patterns such as those at: 
http://www.aa-asterisk.org.uk/index.php/Regular_Expressions_for_Validating_and_F
ormatting_UK_Telephone_Numbers
Set aside any extension number data for later re-use at this point.

The number should be stored (in a database or wherever) in E.164 format, with + 
sign, country code and NSN (i.e. area code if applicable, and local number); 
e.g. +44203000555.

For display, there's a quite complex set of rules for the UK because different 
number ranges have a different total number of digits and there are a variety 
of different area code lengths. These are however quite easy to understand. 
There's a detailed list at: 
http://www.aa-asterisk.org.uk/index.php/Number_format and other places, and the 
applicable regular expressions can be found at:
http://aa-asterisk.org.uk/index.php/Regular_Expressions_for_Validating_and_Forma
tting_UK_Telephone_Numbers#Formatting_UK_telephone_numbers
Use the E.123 format. You can choose to display the number in national format, 
e.g. 020 3000 5555 or (020) 3000 5555 or in full international format, e.g. +44 
20 3000 5555.

Trying to do validation and formatting in a single regular expression is 
difficult and error prone. By splitting the validation process from the 
formatting process the whole system can be much simplified.

I'm the UK metadata editor for the libphonenumber project and have a huge 
amount of UK-specific data on file and to hand. 

Original comment by g1smd.em...@gmail.com on 29 Jul 2012 at 8:59

GoogleCodeExporter commented 8 years ago
Thanks. There's a lot to digest there. I'll take a look at your libphonenumber 
since that could be just the ticket to making these phone field types more 
useful. 

Original comment by yoz...@gmail.com on 30 Jul 2012 at 12:44

GoogleCodeExporter commented 8 years ago
You can either integrate the whole library (pick from java, js, c++, etc)
http://libphonenumber.googlecode.com/svn/trunk/
or you can just use the metadata file and write code to parse it
http://libphonenumber.googlecode.com/svn/trunk/resources/PhoneNumberMetaData.xml

Your choice. :)

Original comment by g1smd.em...@gmail.com on 30 Jul 2012 at 12:55

GoogleCodeExporter commented 8 years ago
We'll add a general phone type that can use this scheme as it seems nice:

http://code.google.com/p/libphonenumber/ 

Need to understand how best to store such phone numbers in the DB (E.164 is 
good but seems to lack an optional extension which is common in business 
environments).

Original comment by yoz...@gmail.com on 5 Aug 2012 at 8:06

GoogleCodeExporter commented 8 years ago
There's many ways to cater for this.

Extensions can be added to the end with #1234 or x1234 or similar.

It's pretty easy to add a RegEx to strip this off before other processing is 
done.

The important point is to store the data in a simple way that includes the 
country code so you can manipulate it later.

Original comment by g1smd.em...@gmail.com on 5 Aug 2012 at 10:41

GoogleCodeExporter commented 8 years ago
Integrated libphonenumber into the 12.8.25 release. When specifying a phone 
field now, you need to also specify the default country to assume if the number 
isn't entered with the country code.

Also, implemented that library's standard formats: NATIONAL, INTERNATIONAL. 
E164 and RFC3966, along with our digits only (digits from NATIONAL format), "as 
is" where we don't do any special formatting.

This removed the DOT, DASH and SPACE formats which we did not see in use in any 
of our customer deployments, though it's possible it's used by those who have 
open source deployments. Sadly, if that's you, you'll want to revisit those 
fields to set them up as you prefer now since the any unknown format spec will 
result in NATIONAL the national format.

Original comment by yoz...@gmail.com on 15 Aug 2012 at 2:01