Open gryphius opened 3 years ago
Hi, thanks for the feedback!
Ehmm would something like this helps? Or what's your exact use case?
curl --location --request POST 'https://api.searchzone.ch/api/as/v1/engines/domains-prod/search' \
--header 'authorization: Bearer search-fwyyo4i26hj5nruvauu3d372' \
--header 'Content-Type: application/json' \
--data-raw '{
"search_fields": {
"a_record": {}
},
"result_fields": {
"domain": {
"raw": {}
}
},
"query": "151.101.1."
}'
for example, if I wanted to search for domains which resolve to 2a02:168:2132::*
:
curl --location --request POST 'https://api.searchzone.ch/api/as/v1/engines/domains-prod/search' \
--header 'authorization: Bearer search-fwyyo4i26hj5nruvauu3d372' \
--header 'Content-Type: application/json' \
--data-raw '{
"search_fields": {
"aaaa_record": {}
},
"result_fields": {
"domain": {
"raw": {}
},
"aaaa_record": {
"raw": {}
}
},
"query": "2a02:168:2132:"
}'
however, this currently also returns "similar" records, such as:
[...]
{
"domain": {
"raw": "sayari.ch"
},
"aaaa_record": {
"raw": [
"2a02:168:be04::42"
]
},
"_meta": {
"id": "sayari.ch",
"engine": "domains-prod",
"score": 5.4933805
},
"id": {
"raw": "sayari.ch"
}
},
{
"domain": {
"raw": "alainwolf.ch"
},
"aaaa_record": {
"raw": [
"2a02:168:f405::42"
]
},
"_meta": {
"id": "alainwolf.ch",
"engine": "domains-prod",
"score": 5.4933805
},
"id": {
"raw": "alainwolf.ch"
}
}
i.e. the aaaa record does not contain 2a02:168:2132
similarly, if I search for "picantepizza", I get tons of results which contain the word "pizza" but not necessarily "picatepizza", such as:
ristorantepizzerialafortuna.ch
ns1.hostserv.eu. info.computrade.ch. 2020101002 7200 120 2419200 10800
185.178.193.95
ns2.hostserv.eu.
ns1.hostserv.eu.
ns3.hostserv.eu.
mail.ristorantepizzerialafortuna.ch.
so, what I was hoping for is an option in the GUI/API to only return results which contain the full search string, and not perform any similarity searches.
Alright, let me take a look on it on the weekend or evening. I guess it has to do how Elasticsearch is indexing this field...
I've checked it and it seems a problem how the data gets indexed with ElasticSearch. I have contacted the ElasticSearch team how to solve it with the AppSearch I'm using under the hood. Will update if I get a solution from their side...
Sorry for the long delay. I'm quite busy with school and work. Sadly there was no progress from Elastic side: https://discuss.elastic.co/t/precise-regex-search/266141/4
I'll try to fix and reindex the data on the weekend...
no worries, thanks for the update!
Ok, it's a product limitation of AppSearch (may be added in a future version).
Anyway, I planed to create a REST-API that queries the ElasticSearch backend. With that implemented it will be possible.
For example:
{
"_source": [
"domain$string"
],
"query": {
"prefix": {
"aaaa_record$string": {
"value": "2a02:168:2132:"
}
}
}
}
or
{
"_source": [
"domain$string"
],
"query": {
"wildcard": {
"aaaa_record$string": "2a02:168:2132:*"
}
}
}
Which currently result in 8 matches, possible? 🤔
My semester ends soon, hopefully I'll find some time to continue with the project.
So, for testing purpose you can use this endpoint. Syntax is the elastic Search API: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
Currently it isn't documented on my side - and I'm not sure if I leave it like this (security, ...) - but if you need help with the syntax and fields let me know.
curl --location --request GET 'https://dev.searchzone.ch/domains/_search?pretty&filter_path=hits.total.value,hits.hits._id,hits.hits._source.aaaa_record' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"prefix": {
"aaaa_record.enum": {
"value": "2a02:168:2132:"
}
}
}
}'
Resulting in:
{
"hits": {
"total": {
"value": 8
},
"hits": [
{
"_id": "opteamal.ch",
"_source": {
"aaaa_record": [
"2a02:168:2132::2"
]
}
},
{
"_id": "organicbodycare.ch",
"_source": {
"aaaa_record": [
"2a02:168:2132::2"
]
}
},
{
"_id": "organic-body-care.ch",
"_source": {
"aaaa_record": [
"2a02:168:2132::2"
]
}
},
{
"_id": "hadornag.ch",
"_source": {
"aaaa_record": [
"2a02:168:2132::2"
]
}
},
{
"_id": "host-bliss.ch",
"_source": {
"aaaa_record": [
"2a02:168:2132::2"
]
}
},
{
"_id": "chromos.ch",
"_source": {
"aaaa_record": [
"2a02:168:2132::2"
]
}
},
{
"_id": "onlineshophosting.ch",
"_source": {
"aaaa_record": [
"2a02:168:2132::2"
]
}
},
{
"_id": "websitedesign.ch",
"_source": {
"aaaa_record": [
"2a02:168:2132::2"
]
}
}
]
}
}
Works very well, thanks! Apart from the "passive dns" use case this enables other interesting searches like "give me all domains with null MX" :+1:
curl --location --request GET 'https://dev.searchzone.ch/domains/_search?pretty&filter_path=hits.total.value,hits.hits._id,hits.hits._source.mx_record' --header 'Content-Type: application/json' --data-raw '{
"query": {
"prefix": {
"mx_record.enum": {
"value": "."
}
}
}
}'
{
"hits" : {
"total" : {
"value" : 1845
},
[...]
No worries about the stable API - if you have to make changes/disable for security reasons that's obviously understandable.
nothing easier than this ;)
curl --location --request GET 'https://dev.searchzone.ch/domains/_search?pretty&filter_path=hits.total.value,hits.hits._id&size=10000' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"term": {
"mx_valid.enum": false
}
}
}'
Keep in mind elasticsearch returns 10000 results per query, check the https://www.elastic.co/guide/en/elasticsearch/reference/current/scroll-api.html for more results!
For each record I have the [type]_record & [type]_valid (true = it exists) field. My elasticsearch mapping got a little messed up with the last upgrade, have to review it later....
So currently I have these records:
curl --location --request GET 'https://dev.searchzone.ch/domains/_search?pretty&filter_path=hits.total.value,hits.hits._id,hits.hits._source.mx_record' --header 'Content-Type: application/json' --data-raw '{ "query": { "prefix": { "mx_record.enum": { "value": "." } } } }'
Ohh I may understood you wrong - https://datatracker.ietf.org/doc/html/rfc7505 😁 but still I hope my comment above helps
Thanks for searchzone.ch, it is a useful tool
Is it possible to somehow disable similarity search and only return results which contain the search string exactly? For example I tried to perform "passive dns" like searches to see which ch-domains are hosted in certain ip ranges, but the results contain many unrelated results which just start with similar octets.