rushmorem / publicsuffix

An implementation of Mozilla's Public Suffix List in Rust
MIT License
97 stars 17 forks source link

Trivial case does not parse correctly #24

Closed dead10ck closed 5 years ago

dead10ck commented 5 years ago

I hit a case where cdn.fbsbx.com does not parse correctly:

let domain = dbg!(list.parse_domain("cdn.fbsbx.com")).unwrap();
assert_eq!(Some("com"), domain.suffix());

This outputs:

[src/main.rs:46] list.parse_domain("cdn.fbsbx.com") = Ok(
    Domain {
        full: "cdn.fbsbx.com",
        typ: None,
        suffix: None,
        registrable: None,
    },
)
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Some("com")`,
 right: `None`', src/main.rs:47:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

The same thing happens with fbsbx.com, foo.cdn.fbsbx.com, foo.fbsbx.com, etc. The closest thing in the public suffix list I can find is apps.fbsbx.com, so I'm not sure what's happening.

dead10ck commented 5 years ago

I believe I see where the issue is:

In Domain::find_match, when looking for the suffix of the input domain, it will keep traversing the tree until it encounters a label that is not in the chain, and then check if the node it stopped at is a "leaf." If not, it assumes that there was no suffix in the input. This will lead to incorrect results in the case of any subdomain of fbsbx.com. For example, the chain for apps.fbsbx.com looks like:

com → fbsbx → apps

so when it receives cdn.fbsbx.com, it will first find com:

com   → fbsbx → apps
(com)

then fbsbx

com   → fbsbx       → apps
        (fbsbx.com)

Then it will stop at the fbsbx node, because the next label was not apps. Since fbsbx.com is not a suffix, it is not a "leaf."

I believe in order to be correct, you must track the longest suffix you've seen as you traverse the tree. In this case, after the first iteration, it should see that com is a suffix and remember that we encountered a valid suffix. When it stops traversing, instead of checking the node where we stopped in the traversal, we need to check if we encountered any valid suffix along the way.