whisperfish / rust-phonenumber

Library for parsing, formatting and validating international phone numbers.
Apache License 2.0
156 stars 55 forks source link

Numbers starting with the same sequence as country prefix are parsed incorrectly #68

Open forgerpl opened 6 months ago

forgerpl commented 6 months ago

Test case:

3912312312 is parsed with the lastest main as 12312312

PhoneNumber {
        code: Code {
            value: 39,
            source: Number,
        },
        national: NationalNumber {
            value: 12312312,
        },
        extension: None,
        carrier: None,
    }

libphone parses this number correctly:

****Parsing Result:****
{"country_code":39,"national_number":3912312312,"raw_input":"3912312312","country_code_source":20}

****Validation Results:****
Result from isPossibleNumber(): true
Result from isValidNumber(): true
Result from isValidNumberForRegion(): true
Phone Number region: IT
Result from getNumberType(): MOBILE

****Formatting Results:**** 
E164 format: +393912312312
Original format: 391 231 2312
National format: 391 231 2312
International format: +39 391 231 2312
Out-of-country format from US: 011 39 391 231 2312
Out-of-country format from Switzerland: 00 39 391 231 2312
Format for mobile dialing (calling from US): +39 391 231 2312
Format for national dialing with preferred carrier code and empty fallback carrier code: 391 231 2312
rubdos commented 6 months ago

How are you calling both libraries? Because if you just phonenumber::parse("3912312312", None), without any country for context, the 39 prefix will be taken for the Italian prefix, and I would think that is expected behaviour.

forgerpl commented 6 months ago

How are you calling both libraries? Because if you just phonenumber::parse("3912312312", None), without any country for context, the 39 prefix will be taken for the Italian prefix, and I would think that is expected behaviour.

In both cases I'm setting the country to IT. My expectation here would be to do the same thing as libphonenumber does - treat the whole thing as a national number. I think that it boils down to libphonenumber using FROM_DEFAULT_COUNTRY and not Number as the code source.

I'm using this as libphonenumber reference: http://libphonenumber.appspot.com/phonenumberparser?number=3912312312&country=IT

rubdos commented 6 months ago

Can reproduce with this test case:

diff --git a/src/phone_number.rs b/src/phone_number.rs
index 560bbbd..53ed717 100644
--- a/src/phone_number.rs
+++ b/src/phone_number.rs
@@ -265,6 +265,12 @@ mod test {
             .unwrap()
     }

+    fn parsed_local(country: country::Id, number: &str) -> PhoneNumber {
+        parser::parse(Some(country), number)
+            .with_context(|| format!("parsing {number}"))
+            .unwrap()
+    }
+
     #[template]
     #[rstest]
     #[case(parsed("+80012340000"), None, Type::TollFree)]
@@ -282,6 +288,7 @@ mod test {
     // https://github.com/whisperfish/rust-phonenumber/issues/46 and
     // https://github.com/whisperfish/rust-phonenumber/issues/47
     // #[case(parsed("+1 520-878-2491"), US)]
+    #[case(parsed_local(IT, "3912312312"), Some(IT), Type::Mobile)]
     fn phone_numbers(
         #[case] number: PhoneNumber,
         #[case] country: Option<country::Id>,
@@ -295,7 +302,7 @@ mod test {
         #[case] country: Option<country::Id>,
         #[case] _type: Type,
     ) -> anyhow::Result<()> {
-        assert_eq!(country, number.country().id());
+        assert_eq!(country, number.country().id(), "parsed as {number:?}");

         Ok(())
     }

It's funny indeed, if you change 39 to 38, it just works fine.

rubdos commented 6 months ago

Weird, can't repro with the Belgian number #[case(parsed_local(BE, "32932620"), Some(BE), Type::Unknown)]. That test just passes. It wouldn't be a valid Belgian number though, because we'd have it start with 03 instead of the plain 3.

rubdos commented 6 months ago

As far as I can tell, the decision is made to get into this branch:

if number.national.starts_with(&code)
    && (!meta.descriptors().general().is_match(&number.national)
        || !validator::length(meta, &number, Type::Unknown).is_possible())
{
    number.country = country::Source::Number;
    number.national = trim(number.national, code.len());
}

in helper::country_code. This means that the given number is not considered a possible number according to the current database, hence it is stripped from the 39 prefix. I'm not sure whether that logic is sound per se, but that's what's happening currently.

We'd need to find the equivalent logic in the Google library to see what's happening exactly, and what should be happening instead.

This also explains why my Belgian example doesn't work as expected.

rubdos commented 6 months ago

Still the same case with the database present in #67, so that does not resolve this.

direc85 commented 6 months ago

How does the original library handle that case?

direc85 commented 2 months ago

libphonenumber handles just the 39... number without country context as invalid:

https://libphonenumber.appspot.com/phonenumberparser?number=3912312312

An error occurred while parsing the input: Error type: INVALID_COUNTRY_CODE. Missing or invalid default region.

With the country given, or with the plus sign, it's parsed correctly:

https://libphonenumber.appspot.com/phonenumberparser?number=3912312312&country=IT

Phone Number entered: 3912312312 E164 format: +393912312312 Original format: 391 231 2312

Without country but just a plus added results in an valid-looking-but-still-invalid number:

https://libphonenumber.appspot.com/phonenumberparser?number=%2B3912312312

Phone Number entered: +3912312312 E164 format: invalid Original format: +3912312312

So I believe this is the behavior we should replicate.