Closed DylanRJohnston closed 3 months ago
Elastic correctly identifies the eTLD.
// POST /_ingest/pipeline/_simulate
{
"pipeline": {
"description": "eTLD",
"processors": [
{
"registered_domain": {
"field": "message",
"target_field": "url"
}
}
]
},
"docs": [
{
"_source": {
"message": "test.servicebus.windows.net"
}
}
]
}
{
"docs": [
{
"doc": {
"_index": "_index",
"_version": "-3",
"_id": "_id",
"_source": {
"message": "test.servicebus.windows.net",
"url": {
"subdomain": "test.servicebus",
"registered_domain": "windows.net",
"top_level_domain": "net",
"domain": "test.servicebus.windows.net"
}
},
"_ingest": {
"timestamp": "2024-07-03T02:33:31.737388501Z"
}
}
}
]
}
Actually it looks like servicebus.windows.net
appears in the public suffix list https://publicsuffix.org/list/public_suffix_list.dat. So perhaps the issue is on the Elastic side 🤔
Actually after looking into this more carefully I think Elastic is the one giving the incorrect response here if I understand the semantics of the registered_domain
processor correctly.
When invoking
parse_etld
for the domaintest.servicebus.windows.net
VRL gives the following resultsPlayground Link
When the correct response is
It seems to incorrectly identify the eTLD as
servicebus.windows.net
instead ofnet
.