Open sasha2484 opened 2 weeks ago
The "location": "59.9452,10.7559"
, does not map to geo point automatically. https://opensearch.org/docs/latest/field-types/supported-field-types/geo-point/
You need to explicitly say it is geo point in your index mapping.
I added point and ip: PUT /nginx*/_mapping?pretty { "properties": { "point": { "type": "geo_point" } } }
{ "acknowledged": true }
PUT /nginx*/_mapping?pretty { "properties": { "ip_address" : { "type" : "ip" } } }
{ "acknowledged": true }
after that I actually see Geospatial field - point. But it seems to contain no data, since there is nothing on the map. I'm guessing that the pipeline itself is not working correctly and is not seeing the clientip field. It's like I'm missing something.
Don’t pay attention to the different index names, I recreate them periodically
I can't figure out why I see fields in the index but don't see them in Discover. Maybe there is an error in this and there is no data in the fields?
It's very strange. I definitely have a database and a processor that should handle the clientip field, which has ip addresses. At the same time, I looked at the statistics of this processor and everything is zero, no errors or treatments. Although if I make a request through this processor, everything works. It's as if he doesn't see this clientip field. Although this field definitely contains ip addresses. { "nodes": { "bjAXJdRNSn6BRwOWMXSgFA": { "ingest": { "total": { "count": 4379, "time_in_millis": 9274, "current": 0, "failed": 0 }, "pipelines": { .... "my-processor": { "count": 0, "time_in_millis": 0, "current": 0, "failed": 0, "processors": [ { "ip2geo": { "type": "ip2geo", "stats": { "count": 0, "time_in_millis": 0, "current": 0, "failed": 0 } } ...
I no longer know where to look According to the instructions, everything is correct. The processor does not seem to see this field.
{ "my-processor2": { "description": "convert ip to country", "processors": [ { "ip2geo": { "datasource": "country-datasource", "field": "clientip", "properties": [ "country_name" ], "ignore_failure": true } } ] } }
GET /_ingest/pipeline/my-processor2/_simulate { "docs":[ { "_source":{ "clientip":"2001:2000::" } } ] }
{ "docs": [ { "doc": { "_index": "_index", "_id": "_id", "_source": { "ip2geo": { "country_name": "Sweden" }, "clientip": "2001:2000::" }, "_ingest": { "timestamp": "2024-09-03T09:48:02.937300222Z" } } } ] }
GET /nginx-2024.09.03/_mapping?pretty
{ "nginx-2024.09.03": { "mappings": { "properties": { .... "clientip": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
HI @sasha2484. I am confused what is the actual issue you are facing now. Could you provide a clear step to reproduce the issue and tell the difference between actual behavior and expected behavior?
The general point is that pipeline does not read the ip from the clientid field. My steps are as follows:
I'm prescribing my-datasource: PUT /_plugins/geospatial/ip2geo/datasource/my-datasource { "endpoint" : "https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json", "update_interval_in_days" : 1 } => { "acknowledged": true }
I see that it works: GET /_plugins/geospatial/ip2geo/datasource/my-datasource => { "datasources": [ { "name": "my-datasource", "state": "AVAILABLE", "endpoint": "https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json", "update_interval_in_days": 1, "next_update_at_in_epoch_millis": 1724839387155, "database": { "provider": "maxmind", "sha256_hash": "t7FahuRg6Pjw+kcP0F29ZFAni4HEbX5WJC+1M38hzLU=", "updated_at_in_epoch_millis": 1724427053000, "valid_for_in_days": 30, "fields": [ "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location" ] }, "update_stats": { "last_succeeded_at_in_epoch_millis": 1724752680532, "last_processing_time_in_millis": 217775 } } ] }
I created a pipeline: PUT /_ingest/pipeline/my-pipeline { "description":"convert ip to geo", "processors":[ { "ip2geo":{ "field":"clientip", "datasource":"my-datasource" } } ] }
=>
{ "acknowledged": true }
POST _ingest/pipeline/my-pipeline/_simulate { "docs": [ { "_index": "testindex1", "_id": "1", "_source": { "clientip": "185.35.83.97" } } ] }
=>
{ "docs": [ { "doc": { "_index": "testindex1", "_id": "1", "_source": { "ip2geo": { "continent_name": "Europe", "country_name": "Norway", "location": "59.9452,10.7559", "country_iso_code": "NO", "time_zone": "Europe/Oslo" }, "clientip": "185.35.83.97" }, "_ingest": { "timestamp": "2024-08-28T08:55:16.048315377Z" } } } ] }
And now I expect that in any new index, if there is a clientip field in it, the pipeline "my-pipeline" will be triggered
Could you tell how did you ingest the doc? I see you were able to process clientip field in your previous example. It does not work anymore?
PUT /nginx-2024.08.28/_doc/my-id?pipeline=my-pipeline
{
"clientip": "185.35.83.97"
}
{
"_index": "nginx-2024.08.28",
"_id": "my-id",
"_version": 4,
"result": "updated",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 24950455,
"_primary_term": 1
}
GET /nginx-2024.08.28/_doc/my-id
{
"_index": "nginx-2024.08.28",
"_id": "my-id",
"_version": 4,
"_seq_no": 24950455,
"_primary_term": 1,
"found": true,
"_source": {
"ip2geo": {
"continent_name": "Europe",
"country_iso_code": "NO",
"country_name": "Norway",
"location": "59.9452,10.7559",
"time_zone": "Europe/Oslo"
},
"clientip": "185.35.83.97"
}
}
I used the instructions from here: https://opensearch.org/docs/2.15/ingest-pipelines/processors/ip2geo/ And from here: https://opensearch.net/blog/new-ip2geo-processor-with-automatic-update/
PUT /nginx-2024.09.05/_doc/my-id?pipeline=my-pipeline { "clientip": "185.35.83.97" } => { "_index": "nginx-2024.09.05", "_id": "my-id", "_version": 5, "result": "updated", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 75963415, "_primary_term": 1 }
GET /nginx-2024.09.05/_doc/my-id => { "_index": "nginx-2024.09.05", "_id": "my-id", "_version": 2, "_seq_no": 75963412, "_primary_term": 1, "found": true, "_source": { "ip2geo": { "continent_name": "Europe", "country_iso_code": "NO", "country_name": "Norway", "location": "59.9452,10.7559", "time_zone": "Europe/Oslo" }, "clientip": "185.35.83.97" } }
That is, when I poison such a request with my hands through DevTools, then everything works as expected. But pipeline itself does not want to work in automatic mode for new indexes and new data that come in. I create a new nginx index every day-{date}, where the clientip field with ip addresses is present. It feels like I'm missing some little thing, but I can't find it in any way. I've tried creating different pipelines, with different names, but none of them want to work.
I'm trying to see how many documents the "my-pipeline" pipeline has processed in total, but I get zeros GET /_nodes/stats/ingest?filter_path=nodes.*.ingest => { "nodes": { "bjAXJdRNSn6BRwOWMXSgFA": { "ingest": { "total": { "count": 4379, "time_in_millis": 9274, "current": 0, "failed": 0 }, "pipelines": { .... "my-pipeline": { "count": 0, "time_in_millis": 0, "current": 0, "failed": 0, "processors": [ { "ip2geo": { "type": "ip2geo", "stats": { "count": 0, "time_in_millis": 0, "current": 0, "failed": 0 } } } ] }, ....
I am creating a new index pattern "nginx*" in which I see new fields ip2geo.continent_name and the like. Logically, I should see the data in them, but I don't see it through Discover.
At the same time, the simulation works GET /_ingest/pipeline/my-pipeline/_simulate { "docs":[ { "_source":{ "clientip":"94.131.3.90" } } ] }
=>
{ "docs": [ { "doc": { "_index": "_index", "_id": "_id", "_source": { "ip2geo": { "continent_name": "Europe", "region_iso_code": "BE", "city_name": "Bern", "country_iso_code": "CH", "country_name": "Switzerland", "region_name": "Bern", "location": "46.9786,7.4483", "time_zone": "Europe/Zurich" }, "clientip": "94.131.3.90" }, "_ingest": { "timestamp": "2024-09-06T08:31:40.035983293Z" } } } ] }
Could you share how you ingest document with automatic mode?
I accept messages from bots, process it with logstash filters and send it to Opensearch. Logstash Filter:
} else if "nginx" in [tags] {
grok {
match => {
"message" => [
"%{IPORHOST:clientip} (?:-|(%{WORD}.%{WORD})) %{USER:ident} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{QS:forwarder} route:(?:%{PROG:route}|) webfarm:(?:%{PROG:webfarm}|) host:%{PROG:site}",
"%{IPORHOST:clientip} (?:-|(%{WORD}.%{WORD})) %{GREEDYDATA:ident} \[%{HTTPDATE:timestamp}\] \"%{URIPROTO:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{QS:forwarder} route:(?:%{PROG:route}|) webfarm:(?:%{PROG:webfarm}|) host:%{PROG:site}",
"(?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME})\s\[%{WORD:eventlevel}\]\s%{POSINT:pid}#%{NUMBER:threadid}\:\s\*%{NUMBER:connectionid}\s%{GREEDYDATA:error}zone\s\"%{WORD:zone}\"\,\sclient\:\s%{IPV4:client_ip}\,\sserver\:\s%{HOSTNAME:server}\,\srequest:\s\"%{WORD:method}\s\/?%{DATA}\/%{INT}\/%{DATA:base}\/%{WORD:service}\/%{GREEDYDATA}",
"(?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{GREEDYDATA:eventmessage}, client: %{IP:client}, server: %{IPORHOST:server}, request: \"%{WORD:req.verb}%{SPACE}/%{NOTSPACE:req.webfarm}/%{NOTSPACE:req.clientname}_%{DATA:req.dbindex}/(?:|%{NOTSPACE:req.apppath})(?: HTTP/%{NUMBER:req.httpversion})\", upstream: \"%{GREEDYDATA:upstream}\", host: \"%{GREEDYDATA:host}\"",
"(?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{GREEDYDATA:eventmessage}, client: %{IP:client}, server: %{IPORHOST:server}, request: \"%{WORD:req.verb}%{SPACE}/%{NOTSPACE:req.webfarm}/%{NOTSPACE:req.clientname}_%{DATA:req.dbindex}/(?:|%{NOTSPACE:req.apppath})(?: HTTP/%{NUMBER:req.httpversion})\", host: \"%{GREEDYDATA:host}\"",
"(?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{GREEDYDATA:eventmessage}, client: %{IP:client}, server: %{IPORHOST:server}"
]
}
}
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z", "yyyy/MM/dd HH:mm:ss"]
target => "@timestamp"
}
mutate {
remove_field => ["timestamp"]
}
ip2geo does not provide information from the database automatically
How can one reproduce the bug? I used this instruction to set up: https://opensearch.org/docs/2.16/ingest-pipelines/processors/ip2geo/
{ "acknowledged": true }
GET /_plugins/geospatial/ip2geo/datasource/my-datasource
{ "datasources": [ { "name": "my-datasource", "state": "AVAILABLE", "endpoint": "https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json", "update_interval_in_days": 1, "next_update_at_in_epoch_millis": 1724839387155, "database": { "provider": "maxmind", "sha256_hash": "t7FahuRg6Pjw+kcP0F29ZFAni4HEbX5WJC+1M38hzLU=", "updated_at_in_epoch_millis": 1724427053000, "valid_for_in_days": 30, "fields": [ "country_iso_code", "country_name", "continent_name", "region_iso_code", "region_name", "city_name", "time_zone", "location" ] }, "update_stats": { "last_succeeded_at_in_epoch_millis": 1724752680532, "last_processing_time_in_millis": 217775 } } ] }
{ "acknowledged": true }
POST _ingest/pipeline/my-pipeline/_simulate { "docs": [ { "_index": "testindex1", "_id": "1", "_source": { "clientip": "185.35.83.97" } } ] }
{ "docs": [ { "doc": { "_index": "testindex1", "_id": "1", "_source": { "ip2geo": { "continent_name": "Europe", "country_name": "Norway", "location": "59.9452,10.7559", "country_iso_code": "NO", "time_zone": "Europe/Oslo" }, "clientip": "185.35.83.97" }, "_ingest": { "timestamp": "2024-08-28T08:55:16.048315377Z" } } } ] }
PUT /nginx-2024.08.28/_doc/my-id?pipeline=my-pipeline { "clientip": "185.35.83.97" }
{ "_index": "nginx-2024.08.28", "_id": "my-id", "_version": 4, "result": "updated", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 24950455, "_primary_term": 1 }
GET /nginx-2024.08.28/_doc/my-id
{ "_index": "nginx-2024.08.28", "_id": "my-id", "_version": 4, "_seq_no": 24950455, "_primary_term": 1, "found": true, "_source": { "ip2geo": { "continent_name": "Europe", "country_iso_code": "NO", "country_name": "Norway", "location": "59.9452,10.7559", "time_zone": "Europe/Oslo" }, "clientip": "185.35.83.97" } }
I recreated the index nginx-2024.08.28 and saw the fields ip2geo.continent_name, ip2geo.country_name and so on
I can't find them through Discover. And I don't see them on the map.
I understand that if I make a request, the data comes in. But why doesn't it work automatically? Data with the clientip field is constantly coming in
GET /nginx-2024.08.28/
{ "nginx-2024.08.28": { "aliases": {}, "mappings": { "properties": { "@timestamp": { "type": "date" }, ..... "clientip": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, ....
What is the expected behavior? I am waiting for the data in these fields to be used in the map
What is your host/environment?