Closed GoogleCodeExporter closed 8 years ago
Our US (NANP) does not have such detailed validations, but we can certainly do
it like this as long as you are sure it's correct.
You mention the phone is 10 digits, but the forms you show often contain 11
digits. I see none with 10 digits or 9 digits.
Might there be a simpler validation that would suffice. Even the US phone
doesn't ensure it's a real phone, just that it's 10 digits and then we assume
several common display formats that would apply to any 10 digits. We need some
specs to help us build this correctly, to ensure reasonable values/formats
without worrying about any detailed validations.
For example, you say these are valid formats:
07xxx xxxxxx and 08xx xxx xxxx
Does this mean 07123 123456 would be acceptable, and 0812 123 1234 would be
okay, but
0789 123 1234 would not and 08123 123456 would not? Those poor Europeans with
their myriad formats! I noted Wikipedia shows that some countries are 7 digits,
some 13 digits. I saw a UK number as 07700 954 321 rather than 07700 954321.
Since you need this, it seems like there should be a cleaner spec. For example,
we used to allow 10 digits and assume the general format xxx.xxx.xxxx, but if
they gave us more than 10 digits, we just "left it as is" and assume it was
okay as it at least had enough digits, but that allowed for 425.555.1212 ext.
1101 entries, etc.
Anyway, if this is for a particular customer who wants the phone number, can
they provide you with a detailed spec on what they'd fine useful? Are there
any examples of such validators written already we can review?
Original comment by yoz...@gmail.com
on 11 Jun 2011 at 11:29
After some investigation here, it seems that there's just no way to ensure a
phone number entered is valid. Even in the US, where the main number is 10
digits, if users put in an extension, it will fail.
What about this idea? We create a GeneralPhoneNumber validator that does this:
1) Allows only digits, parentheses, hyphen, dot, space, and the letter combos
"x" or "ext" or "extension" no more than once
2) You can use the min-length and max-length fields to control for length of
digits only, such as a minimum number of digits and max digits, even if they
entered other stuff.
3) There's no reformatting of what was entered.
In general, since a phone number cannot be validated in any meaningful way,
this would allow for any sort of international numbers. When I looked at other
countries, they all seem to format differently. I mean, if you enter
1234567890 in the US Phone field it will be "valid" though not really.
The general consensus I've read seems to indicate that people tend to know
their phone numbers and enter them correctly, and you don't want to block input
on a form just because we've failed to capture one particular format, which are
also changing all the time because of the expansion of cell phones and the like.
Of course, if we can get a very precise definition of phone numbers for the UK,
and the UK become big users, I'm sure it will be added then. Also, in the
future, we'll have custom field validators that will allow for more complex
validation of a General field than perhaps regular expressions allow now.
That said, this regular expression might work for you:
(\(01\d\d\d\d\) \d\d\d\d\d?)|(\(01\d\d\d\) \d\d\d\d\d\d?)|(\(01\d1\) \d\d\d
\d\d\d\d)... you should get the idea with each valid option in parens,
separated by | (for OR logic), and \( and \) meaning the user actually entered
the parens, \d being a digit, and \d? being an optional digit.
to match any of these:
(01xxxx) xxxx[x]
(01xxx) xxxxx[x]
(01x1) xxx xxxx
(011x) xxx xxxx
(02x) xxxx xxxx
03xx xxx xxxx
07xxx xxxxxx
08xx xxx xxxx
09xx xxx xxxx
Original comment by yoz...@gmail.com
on 12 Jun 2011 at 3:55
Original comment by yoz...@gmail.com
on 12 Jun 2011 at 7:20
There are several ways to ensure that a telephone number is potentially valid.
Very detailed data is available for the UK and for many other countries.
However, before you start, there are four concepts that need to separated.
These are "input format", "valid number range and valid number length for this
range", "storage format" and "display format".
Many systems try to constrain the user to typing numbers in a particular
format, and this is usually a very bad idea. The London number 020 3000 5555
can be written as (020) 3000 5555 or as +44 3000 5555, but you'll equally see
people writing 0203 000 5555, 02030 005 555, +44 (0) 20 3000 5555, +44(0)203
000 5555, (44) 20 3000 5555, (44) 203 000 5555, (44) 2030 005 5555, (+44 203)
000 5555, (+44) 203 000 5555, and many others, and the same again each with
hyphens in various positions.
The user should be allowed to do that. Most users do not properly understand
how telephone numbers work nor the significance of the spaces between country
code, area code and local number.
Once entered, the number should then have the +44 and/or 0 prefix stripped, the
punctuation and spaces removed, and only now should the remainder (the 'NSN')
be checked for length and validity. This is very easy to do using RegEx
patterns such as those at:
http://www.aa-asterisk.org.uk/index.php/Regular_Expressions_for_Validating_and_F
ormatting_UK_Telephone_Numbers
Set aside any extension number data for later re-use at this point.
The number should be stored (in a database or wherever) in E.164 format, with +
sign, country code and NSN (i.e. area code if applicable, and local number);
e.g. +44203000555.
For display, there's a quite complex set of rules for the UK because different
number ranges have a different total number of digits and there are a variety
of different area code lengths. These are however quite easy to understand.
There's a detailed list at:
http://www.aa-asterisk.org.uk/index.php/Number_format and other places, and the
applicable regular expressions can be found at:
http://aa-asterisk.org.uk/index.php/Regular_Expressions_for_Validating_and_Forma
tting_UK_Telephone_Numbers#Formatting_UK_telephone_numbers
Use the E.123 format. You can choose to display the number in national format,
e.g. 020 3000 5555 or (020) 3000 5555 or in full international format, e.g. +44
20 3000 5555.
Trying to do validation and formatting in a single regular expression is
difficult and error prone. By splitting the validation process from the
formatting process the whole system can be much simplified.
I'm the UK metadata editor for the libphonenumber project and have a huge
amount of UK-specific data on file and to hand.
Original comment by g1smd.em...@gmail.com
on 29 Jul 2012 at 8:59
Thanks. There's a lot to digest there. I'll take a look at your libphonenumber
since that could be just the ticket to making these phone field types more
useful.
Original comment by yoz...@gmail.com
on 30 Jul 2012 at 12:44
You can either integrate the whole library (pick from java, js, c++, etc)
http://libphonenumber.googlecode.com/svn/trunk/
or you can just use the metadata file and write code to parse it
http://libphonenumber.googlecode.com/svn/trunk/resources/PhoneNumberMetaData.xml
Your choice. :)
Original comment by g1smd.em...@gmail.com
on 30 Jul 2012 at 12:55
We'll add a general phone type that can use this scheme as it seems nice:
http://code.google.com/p/libphonenumber/
Need to understand how best to store such phone numbers in the DB (E.164 is
good but seems to lack an optional extension which is common in business
environments).
Original comment by yoz...@gmail.com
on 5 Aug 2012 at 8:06
There's many ways to cater for this.
Extensions can be added to the end with #1234 or x1234 or similar.
It's pretty easy to add a RegEx to strip this off before other processing is
done.
The important point is to store the data in a simple way that includes the
country code so you can manipulate it later.
Original comment by g1smd.em...@gmail.com
on 5 Aug 2012 at 10:41
Integrated libphonenumber into the 12.8.25 release. When specifying a phone
field now, you need to also specify the default country to assume if the number
isn't entered with the country code.
Also, implemented that library's standard formats: NATIONAL, INTERNATIONAL.
E164 and RFC3966, along with our digits only (digits from NATIONAL format), "as
is" where we don't do any special formatting.
This removed the DOT, DASH and SPACE formats which we did not see in use in any
of our customer deployments, though it's possible it's used by those who have
open source deployments. Sadly, if that's you, you'll want to revisit those
fields to set them up as you prefer now since the any unknown format spec will
result in NATIONAL the national format.
Original comment by yoz...@gmail.com
on 15 Aug 2012 at 2:01
Original issue reported on code.google.com by
lstro...@gmail.com
on 11 Jun 2011 at 10:49