Closed dead10ck closed 5 years ago
I believe I see where the issue is:
In Domain::find_match
, when looking for the suffix of the input domain, it will keep traversing the tree until it encounters a label that is not in the chain, and then check if the node it stopped at is a "leaf." If not, it assumes that there was no suffix in the input. This will lead to incorrect results in the case of any subdomain of fbsbx.com
. For example, the chain for apps.fbsbx.com
looks like:
com → fbsbx → apps
so when it receives cdn.fbsbx.com
, it will first find com
:
com → fbsbx → apps
(com)
then fbsbx
com → fbsbx → apps
(fbsbx.com)
Then it will stop at the fbsbx
node, because the next label was not apps
. Since fbsbx.com
is not a suffix, it is not a "leaf."
I believe in order to be correct, you must track the longest suffix you've seen as you traverse the tree. In this case, after the first iteration, it should see that com
is a suffix and remember that we encountered a valid suffix. When it stops traversing, instead of checking the node where we stopped in the traversal, we need to check if we encountered any valid suffix along the way.
I hit a case where
cdn.fbsbx.com
does not parse correctly:This outputs:
The same thing happens with
fbsbx.com
,foo.cdn.fbsbx.com
,foo.fbsbx.com
, etc. The closest thing in the public suffix list I can find isapps.fbsbx.com
, so I'm not sure what's happening.