zhr85210078 / node-mongodb-es-connector

nodejs mongodb elasticsearch synchrodata(mongodb和es同步数据)
https://zhr85210078.github.io/node-mongodb-es-connector/#/
MIT License
77 stars 17 forks source link

how to denote datatype "nested" instead of Object - what differentiate object and nested in elastic search mapping in CrawlerData. #27

Open raj92 opened 4 years ago

raj92 commented 4 years ago

I am stuck while defining the mapping for a field (files) type that is nested rather then Object. I am looking into solution to define the files type = "nested" otherwise filter will not work.

As we are to define the mapping in json file and preprocessor pipeline - we are already creating Array Object [{}] but how to denote "nested" - what differentiate object and nested in elastic search mapping in CrawlerData.

They must be some way in "node-mongodb-es-connector" plugin which help us do so.

because [{}] is purely array object.

Please help.

raj92 commented 4 years ago

Any update on this ? Please reply.

zhr85210078 commented 4 years ago

I think you should use 'Foreach Processor' and 'JSON Processor' to create a pipeline. Like this:

PUT _ingest/pipeline/yourpip { "description" : "test", "processors" : [ { "foreach": { "field": "yourArrayField", "processor": { "json": { "field": "yourObjField" } } } } ] }

I hope I can help you.

raj92 commented 4 years ago

As per the suggestion the PIPELINE BODY will look like this ::

PUT :: localhost:9200/_ingest/pipeline/convert-object-nested-pipeline { "description" : "files to be array objects", "processors" : [ { "foreach": { "field": "files", "processor": { "json": { "field": "files" } } } } ] }

Response ::

{ "acknowledged": true }

My CrawlerData/userfiles.json file ::

{ "mongodb": { "m_database": "myTest", "m_collectionname": "userfiles", "m_filterfilds": { "version": "2.0" }, "m_returnfilds": { "files" : [{"fileId" : "8eecdf24-a783-4430-abfc-7bb2c6487cfb"}], "orgId" : "text", "directory" : "type", "size" : "integer", "userName" : "text", "groupId": "type" }, "m_extendfilds": { "bA": "this is a extend fild bA", "bB": "this is a extend fild bB" }, "m_extendinit": { "m_comparefild": "_id", "m_comparefildType": "ObjectId", "m_startFrom": "2018-09-17 13:44:00", "m_endTo": "2018-09-17 16:27:51" }, "m_connection": { "m_servers": [ "localhost:27017" ], "m_authentication": { "username": "test", "password": "test123", "authsource": "myTest", "replicaset": "test",
"ssl": false } }, "m_url": "mongodb://test:test123@localhost:27017/myTest?authSource=myTest", "m_documentsinbatch": 5000, "m_delaytime": 1000, "max_attachment_size":5242880 }, "elasticsearch": { "e_index": "userfiles", "e_type": "userfiles", "e_connection": { "e_server": "http://localhost:9200", "e_httpauth": { "username": "EsAdmin", "password": "pass1234" } }, "e_pipeline": false, "e_iscontainattachment": false } }

Please note in the above json file if i remove pipeline name/ID it stoping working (not creating index into Elastic Search). But it work when i make it false.

Our purpose it make the file array object filterable. Now the object in the file array are pure object - we have to make it nested.

Therefore, we are trying to define the DATATYPE of files into NESTED - To achieve full text search (filter 100%).

So that we can search the nested document after adding mapping like this :

GET workspaces_userfiles/_search?pretty { "query": { "nested": { "path": "files", "query": { "match": { "files.name": "a" } }, "inner_hits": {}

But as per your suggestion, we are again making it JSON object.

May be the mapping structure and our json data structure help you in understanding.

{ "mappings": { "properties": { "files": { "type": "nested", "properties": { "shared": {"type": "text"}, "s3url": {"type": "text"}, "lastDownloadedOn": {"type": "text"}, "type": {"type": "text"}, "createdOn": {"type": "text"}, "path": {"type": "text"}, "lastUpdated": {"type": "text"}, "readablePdfPath": {"type": "text"}, "size": {"type": "integer"}, "encrypted": {"type": "text"}, "name": {"type": "text"}, "thumbPath": {"type": "text"}, "fileId": {"type": "text"} } }, "orgId" : {"type": "text"}, "directory" : {"type": "text"}, "size" : {"type": "integer"}, "userName" : {"type": "text"}, "groupId": {"type": "text"}, "timestamp" : {"type": "text"} } } }

{ "_id" : ObjectId("5d6e3e23dc41902ae4152c2a"), "orgId" : "o20190828092008798", "directory" : "pratirumala", "size" : -201920, "files" : [ { "fileId" : "8eecdf24-a783-4430-abfc-7bb2c6487cfb", "name" : "a1", "type" : "Folder", "size" : 0, "path" : "pratirumala/a1", "shared" : "No", "createdOn" : "2019-10-26T05:03:25.778Z", "lastUpdated" : "2019-10-26T05:03:25.778Z", "encrypted" : "No" }, { "fileId" : "525631de-95e5-43a1-8756-e356ce4cf4b4", "name" : "f01", "type" : "Folder", "size" : 0, "path" : "pratirumala/f01", "shared" : "No", "createdOn" : "2019-11-07T04:13:15.476Z", "lastUpdated" : "2019-11-07T04:13:15.476Z", "encrypted" : "No" } ] }

Hope you understand what is our issue.

We have also tried to convert the field type : https://www.elastic.co/guide/en/elasticsearch/reference/current/convert-processor.html?utm_source=hacpai.com

But "nested" data type not supported.

Please help.

Also, Please revert if any question.