Open raj92 opened 4 years ago
Any update on this ? Please reply.
I think you should use 'Foreach Processor' and 'JSON Processor' to create a pipeline. Like this:
PUT _ingest/pipeline/yourpip { "description" : "test", "processors" : [ { "foreach": { "field": "yourArrayField", "processor": { "json": { "field": "yourObjField" } } } } ] }
I hope I can help you.
As per the suggestion the PIPELINE BODY will look like this ::
PUT :: localhost:9200/_ingest/pipeline/convert-object-nested-pipeline { "description" : "files to be array objects", "processors" : [ { "foreach": { "field": "files", "processor": { "json": { "field": "files" } } } } ] }
Response ::
{ "acknowledged": true }
My CrawlerData/userfiles.json file ::
{
"mongodb": {
"m_database": "myTest",
"m_collectionname": "userfiles",
"m_filterfilds": {
"version": "2.0"
},
"m_returnfilds": {
"files" : [{"fileId" : "8eecdf24-a783-4430-abfc-7bb2c6487cfb"}],
"orgId" : "text",
"directory" : "type",
"size" : "integer",
"userName" : "text",
"groupId": "type"
},
"m_extendfilds": {
"bA": "this is a extend fild bA",
"bB": "this is a extend fild bB"
},
"m_extendinit": {
"m_comparefild": "_id",
"m_comparefildType": "ObjectId",
"m_startFrom": "2018-09-17 13:44:00",
"m_endTo": "2018-09-17 16:27:51"
},
"m_connection": {
"m_servers": [
"localhost:27017"
],
"m_authentication": {
"username": "test",
"password": "test123",
"authsource": "myTest",
"replicaset": "test",
"ssl": false
}
},
"m_url": "mongodb://test:test123@localhost:27017/myTest?authSource=myTest",
"m_documentsinbatch": 5000,
"m_delaytime": 1000,
"max_attachment_size":5242880
},
"elasticsearch": {
"e_index": "userfiles",
"e_type": "userfiles",
"e_connection": {
"e_server": "http://localhost:9200",
"e_httpauth": {
"username": "EsAdmin",
"password": "pass1234"
}
},
"e_pipeline": false,
"e_iscontainattachment": false
}
}
Please note in the above json file if i remove pipeline name/ID it stoping working (not creating index into Elastic Search). But it work when i make it false.
Our purpose it make the file array object filterable. Now the object in the file array are pure object - we have to make it nested.
Therefore, we are trying to define the DATATYPE of files into NESTED - To achieve full text search (filter 100%).
So that we can search the nested document after adding mapping like this :
GET workspaces_userfiles/_search?pretty { "query": { "nested": { "path": "files", "query": { "match": { "files.name": "a" } }, "inner_hits": {}
But as per your suggestion, we are again making it JSON object.
May be the mapping structure and our json data structure help you in understanding.
{ "mappings": { "properties": { "files": { "type": "nested", "properties": { "shared": {"type": "text"}, "s3url": {"type": "text"}, "lastDownloadedOn": {"type": "text"}, "type": {"type": "text"}, "createdOn": {"type": "text"}, "path": {"type": "text"}, "lastUpdated": {"type": "text"}, "readablePdfPath": {"type": "text"}, "size": {"type": "integer"}, "encrypted": {"type": "text"}, "name": {"type": "text"}, "thumbPath": {"type": "text"}, "fileId": {"type": "text"} } }, "orgId" : {"type": "text"}, "directory" : {"type": "text"}, "size" : {"type": "integer"}, "userName" : {"type": "text"}, "groupId": {"type": "text"}, "timestamp" : {"type": "text"} } } }
{ "_id" : ObjectId("5d6e3e23dc41902ae4152c2a"), "orgId" : "o20190828092008798", "directory" : "pratirumala", "size" : -201920, "files" : [ { "fileId" : "8eecdf24-a783-4430-abfc-7bb2c6487cfb", "name" : "a1", "type" : "Folder", "size" : 0, "path" : "pratirumala/a1", "shared" : "No", "createdOn" : "2019-10-26T05:03:25.778Z", "lastUpdated" : "2019-10-26T05:03:25.778Z", "encrypted" : "No" }, { "fileId" : "525631de-95e5-43a1-8756-e356ce4cf4b4", "name" : "f01", "type" : "Folder", "size" : 0, "path" : "pratirumala/f01", "shared" : "No", "createdOn" : "2019-11-07T04:13:15.476Z", "lastUpdated" : "2019-11-07T04:13:15.476Z", "encrypted" : "No" } ] }
Hope you understand what is our issue.
We have also tried to convert the field type : https://www.elastic.co/guide/en/elasticsearch/reference/current/convert-processor.html?utm_source=hacpai.com
But "nested" data type not supported.
Please help.
Also, Please revert if any question.
I am stuck while defining the mapping for a field (files) type that is nested rather then Object. I am looking into solution to define the files type = "nested" otherwise filter will not work.
As we are to define the mapping in json file and preprocessor pipeline - we are already creating Array Object [{}] but how to denote "nested" - what differentiate object and nested in elastic search mapping in CrawlerData.
They must be some way in "node-mongodb-es-connector" plugin which help us do so.
because [{}] is purely array object.
Please help.