graphaware / neo4j-to-elasticsearch

GraphAware Framework Module for Integrating Neo4j with Elasticsearch
261 stars 57 forks source link

Update mapping using a template for elasticsearch and mapping.json for graphaware is rejected #152

Closed yyahyaoui closed 3 years ago

yyahyaoui commented 5 years ago

Hi all,

i am using elasticsearch7 and neo4j with a replication of neo4j to elasticsearch using graphaware plugin. So far so good. Also am using a template for elasticsearch to define specific mapping like tokenizer and analyzer to enable autocomplete search on top of elasticsearch. The template looks like this: PUT _template/template_neo4j { "index_patterns": ["neo4j*"], "settings": { "number_of_shards": 1, "analysis": { "filter": { "nGram_filter": { "type": "nGram", "min_gram": 2, "max_gram": 3, "token_chars": [ "letter", "digit", "punctuation", "symbol" ] } }, "analyzer": { "nGram_analyzer": { "type": "custom", "tokenizer": "whitespace", "filter": [ "lowercase", "asciifolding", "nGram_filter" ] }, "whitespace_analyzer": { "type": "custom", "tokenizer": "whitespace", "filter": [ "lowercase", "asciifolding" ] } } } }, "mappings": { "dynamic_templates": [ { "string_fields": { "match":"*", "match_mapping_type":"string", "mapping": { "type":"text", "analyzer":"nGram_analyzer" } } } ], "dynamic":true } }

I am using dynamic template to enable the given nGram_analyzer fot all terms of type string. Besides i am using the following mapping.json:

{ "defaults": { "key_property": "uuid", "nodes_index": "neo4j-index-node", "relationships_index": "neo4j-index-relationship", "include_remaining_properties": false, "exclude_empty_properties": true }, "node_mappings": [ { "condition": "hasLabel('GlobalBusiness')", "type": "globalbsuiness", "properties": { "parentBusiness": "getProperty('BusinessName')", "subGlobalFirm": "query('MATCH (n) WHERE id(n) = {id} MATCH (n:GlobalBusiness)-[:HAS_SUB_GLOBAL_FIRM]->(f:GlobalBusiness) RETURN collect(f.BusinessName) AS value')" } } ], "relationship_mappings": [ { "condition": "isType('HAS_SUB_GLOBAL_FIRM')", "type": "globalFirm" } ] }

when trying to start neo4j database / indexing the graph, the following error occures:

[2019-05-15T11:08:58,663][DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [DE-20HEPF0XGV48] failed to put mappings on indices [[[neo4j-index-node/AAyW79tDTTebihkDl79kag]]], type [globalbsuiness] java.lang.IllegalArgumentException: Rejecting mapping update to [neo4j-index-node] as the final mapping would have more than 1 type: [_doc, globalbsuiness] at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:449) ~[elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.index.mapper.MapperService.internalMerge(MapperService.java:398) ~[elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:331) ~[elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.applyRequest(MetaDataMappingService.java:315) ~[elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.execute(MetaDataMappingService.java:238) ~[elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687) ~[elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310) ~[elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.0.0.jar:7.0.0] at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.0.0.jar:7.0.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:835) [?:?]

I guess the problem occurred because the default-index-template add its own type to all terms which is _doc. When starting indexing elasticsearch gets another type which is globalbusiness which comes from mapping.json. Since there are two types, elastic search rejects the mapping coming from mapping.json.

My questions are:

  1. Has someone any idea how can i fix this issue
  2. Is it possible to define default mapping for anaylzer, tokenizer .. in the mapping.json or to refer to a template.

PS: iam using the following plugins:

graphaware-neo4j-to-elasticsearch-3.5.4.53.11.jar graphaware-server-community-all-3.5.4.53.jar graphaware-uuid-3.5.4.53.17.jar

with elasticsearch 7 and neo4j 3.5.2

The settings in neo4j for graphaware is:

`dbms.unmanaged_extension_classes=com.graphaware.server=/graphaware com.graphaware.runtime.enabled=true com.graphaware.module.UIDM.1=com.graphaware.module.uuid.UuidBootstrapper com.graphaware.module.UIDM.uuidIndex=uuidIndex com.graphaware.module.UIDM.uuidProperty=uuid com.graphaware.module.UIDM.relationship=true com.graphaware.module.UIDM.initializeUntil=0 com.graphaware.module.UIDM.node=hasLabel('GlobalBusiness')

com.graphaware.module.ES.2=com.graphaware.module.es.ElasticSearchModuleBootstrapper com.graphaware.module.ES.uri=localhost com.graphaware.module.ES.port=9200 com.graphaware.module.ES.protocol=http com.graphaware.module.ES.keyProperty=uuid com.graphaware.module.ES.retryOnError=false com.graphaware.module.ES.queueSize=10000 com.graphaware.module.ES.reindexBatchSize=2000 com.graphaware.module.ES.node=hasLabel('GlobalBusiness') com.graphaware.module.ES.relationship=(true) com.graphaware.module.ES.bulk=true com.graphaware.module.ES.initializeUntil=2222222222222 com.graphaware.module.ES.mapping=com.graphaware.module.es.mapping.JsonFileMapping com.graphaware.module.ES.file=mapping.json`

Thanks in advance.

Regards Younes

Has someone any idea how can i fix this issue

ikwattro commented 5 years ago

This is an interesting issue. I will have some look into it in the coming days.

yyahyaoui commented 5 years ago

Thanks @ikwattro . I appreciate your help. I am still searching for a work-around, but until now unfortunately without success

DaveClissold commented 5 years ago

Just too add some clarity to this. As of V7, the document type is now deprecated, so type mappings are not accepted. @yyahyaoui you will have to change all types within your mappings that are not '_doc'

https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

Elasticsearch 7.x Specifying types in requests is deprecated. For instance, indexing a document no longer requires a document type. The new index APIs are PUT {index}/_doc/{id} in case of explicit ids and POST {index}/_doc for auto-generated ids. Note that in 7.0, _doc is a permanent part of the path, and represents the endpoint name rather than the document type.

m00dy commented 4 years ago

hello,

I've also got the same issue. How did you guys solve this ?

Thanks.

Tak1za commented 3 years ago

If @DaveClissold 's answer doesn't help or changes are not made to this plugin, the only solution I can think of is Reindex. That can be done in two ways:

  1. If you want to create a new index while the original index is still getting populated, create an ingestion pipeline that can reindex with your defined mapping.
  2. If you can afford to wait till the original index gets populated, reindex to your custom mapping using the Reindex API that elastic provides.