graphaware / neo4j-to-elasticsearch

GraphAware Framework Module for Integrating Neo4j with Elasticsearch
261 stars 57 forks source link

Capped result? #127

Closed AndreasHenningsson closed 6 years ago

AndreasHenningsson commented 6 years ago

Hi, during an analysis of your wonderful plug-in it seems that the below funktion-call constantly returns 10 nodes

CALL ga.es.queryNode('{\"query\":{\"match\":{\"text\":\"person\"}}}') YIELD node RETURN node

There are ~220 "text" that contains the word person.

Is this a feature, configuration issue or a bug?

ikwattro commented 6 years ago

Hi @AndreasHenningsson ,

This just proxies a real elasticsearch query to Elastic, so as Elastic is by default returning 10 results you would have to change the query exactly as you would query Elastic directly :

For example, this query will return max 500 results:

CALL ga.es.queryNode('{"from":0, "size":500, "query":{"match":{"text":"person"}}}') 
YIELD node 
RETURN node
AndreasHenningsson commented 6 years ago

Tanks,

Is it possible to configure in mapping.json that a specific attribute should be replecated as a "date"? Example from my test json file. The below attribute submit_date should preferably be a date-type. Right now it is beeing replecated as a string (when viewed in kibana).

{ "defaults": { "key_property": "uuid", "nodes_index": "default-index-node", "relationships_index": "default-index-relationship", "include_remaining_properties": true }, "node_mappings": [ { "condition": "hasLabel('doc')", "index": "nodes-doc", "type": "documents", "properties": { "Status": "getProperty('Status')", "DocNumber": "getProperty('DocNumber')", "Priority": "getProperty('Priority')", "Service_Type": "getProperty('Service_Type')", "Submit_Date": "getProperty('Submit_Date')" } } ], "relationship_mappings": [ { "condition": "allRelationships()", "type": "relationships" } ] }

ikwattro commented 6 years ago

@AndreasHenningsson Can you give me an example value ?

AndreasHenningsson commented 6 years ago

Yes the value is stored as follows: Submit_Date: 2018-07-01 13:01:03

ikwattro commented 6 years ago

So, it is correct and should be sent as a string. Your Elastic mapping should define it as a date field : https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html

AndreasHenningsson commented 6 years ago

Strange. I can see that it should be detected as a dataformat (according to the above link). No, i would like it as a datefield. When viewed in kibana it is stored as a "string"

I have se if there are any other formats of the date in the sample.

But if the Submit_Date would look like this 2015-11-01 00:01 the elastic would not consider it as a date due to lacking of seconds?

Is this a valid dateformat (according to elastic): 2015-12-01T05:13:22.000+0000

ikwattro commented 6 years ago

Can you paste here your ES mapping for that particular index.

ikwattro commented 6 years ago

Tested with Elastic 6.3, if you have a well defined date type for a field and try to pass a wrong format, such as the one without seconds, it will fail :

ikwattro@mbp666 ~/d/_es> http POST http://localhost:9200/my_index/_doc/1 < doc.json -v
POST /my_index/_doc/1 HTTP/1.1
Accept: application/json, */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 31
Content-Type: application/json
Host: localhost:9200
User-Agent: HTTPie/0.9.9

{
    "date": "2015-01-01 12:10"
}

HTTP/1.1 400 Bad Request
content-encoding: gzip
content-length: 194
content-type: application/json; charset=UTF-8

{
    "error": {
        "caused_by": {
            "reason": "Invalid format: \"2015-01-01 12:10\" is malformed at \" 12:10\"",
            "type": "illegal_argument_exception"
        },
        "reason": "failed to parse [date]",
        "root_cause": [
            {
                "reason": "failed to parse [date]",
                "type": "mapper_parsing_exception"
            }
        ],
        "type": "mapper_parsing_exception"
    },
    "status": 400
}

That said, such format is not valid either : 2018-07-01 13:01:03

AndreasHenningsson commented 6 years ago

Ok, that said neither 2018-07-01 13:01:03 or 2018-07-01 13:01 is a valid format

So, how should a datafield in neo look like in order to get the type "date" i elastic?

I think a meet the beelow critera

Sorry for bothering you :-)


JSON doesn’t have a date datatype, so dates in Elasticsearch can either be:

strings containing formatted dates, e.g. "2015-01-01" or "2015/01/01 12:10:30". a long number representing milliseconds-since-the-epoch. an integer representing seconds-since-the-epoch.

AndreasHenningsson commented 6 years ago

Can I assign a format of the attribute in mapping.json?

ikwattro commented 6 years ago

@AndreasHenningsson you will need to modify your ES mapping for the index to allow different date formats to be indexed, you can do so by updating your mapping :

{
  "mappings": {
    "_doc": {
      "properties": {
        "date": {
          "type": "date",
      "format": "date_optional_time||yyyy-MM-dd HH:mm||yyyy-MM-dd HH:mm:ss||epoch_millis"
        }
      }
    }
  }
}

Here we allow the date to be in different formats, afterwards you can continue to index your data. For example I just ingested 4 docs into different formats of strings :

ikwattro@mbp666 ~/d/_es> http http://localhost:9200/my_index/_search
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 217
content-type: application/json; charset=UTF-8

{
    "_shards": {
        "failed": 0,
        "skipped": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_id": "2",
                "_index": "my_index",
                "_score": 1.0,
                "_source": {
                    "date": "2015-01-01 10:30"
                },
                "_type": "_doc"
            },
            {
                "_id": "4",
                "_index": "my_index",
                "_score": 1.0,
                "_source": {
                    "date": "2015-01-01T10:30:25Z"
                },
                "_type": "_doc"
            },
            {
                "_id": "1",
                "_index": "my_index",
                "_score": 1.0,
                "_source": {
                    "date": "2015-01-01"
                },
                "_type": "_doc"
            },
            {
                "_id": "3",
                "_index": "my_index",
                "_score": 1.0,
                "_source": {
                    "date": "2015-01-01 10:30:25"
                },
                "_type": "_doc"
            }
        ],
        "max_score": 1.0,
        "total": 4
    },
    "timed_out": false,
    "took": 3
}

This plugin only send json to ES, there is no way to specify a type for it.

ikwattro commented 6 years ago

@AndreasHenningsson closing this as it is not an issue with the plugin. Please continue to comment on the ticket if you have further questions

AndreasHenningsson commented 6 years ago

Thanks for the help!

tors 5 juli 2018 kl. 13:45 skrev Christophe Willemsen < notifications@github.com>:

Closed #127 https://github.com/graphaware/neo4j-to-elasticsearch/issues/127.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/graphaware/neo4j-to-elasticsearch/issues/127#event-1717872753, or mute the thread https://github.com/notifications/unsubscribe-auth/AedOiGhi-xirfrN4mu3FlXW99jogYARaks5uDfxkgaJpZM4U_BG6 .