ozlerhakan / mongolastic

:traffic_light: A dataset migration tool from MongoDB to Elasticsearch and vice versa.
MIT License
136 stars 34 forks source link

feature request: simple transformation step when migrating from mongo to es #20

Closed whollacsek closed 7 years ago

whollacsek commented 8 years ago

It would be nice to be able to do some simple manipulation on the document that will be inserted into ES. For example add a new field that is the result of some computation of other fields.

ozlerhakan commented 8 years ago

Hi @whollacsek ,

Thank you for this feature request!

I think we can add this feature by adding an extra field which can contains 3 different fields something like;

{
    "misc": {
        "dindex": {
            "name": "twitter",
            "as": "kodcu"
        },
        "ctype": {
            "name": "tweets",
            "as": "posts"
        }
    },
    "mongo": {
        "host": "localhost",
        "port": 27017,
        "query": "{ 'user.name' : 'kodcu.com'}"
    },
    "elastic": {
        "host": "localhost",
        "port": 9300,
        "fields" ; {
             "add" : [
              { "new-field-name": "city", "value": "$address", "type": "string"}
             ],
             "remove" : [
              { "field-name": "city"} 
             ],
             "update": [
              { "existing-field-name": "address", "value": "$city + ' ' + $country ", "type": "string"}
             ] 
        }
    }
}

An user can set multiple fields into add, remove, and/or update fields. But this will also impact the migration time. wdyt?

whollacsek commented 8 years ago

That looks nice. My use case would be:

{
    ...
    "elastic": {
        "host": "localhost",
        "port": 9300,
        "fields": {
             "add" : [
              { "new-field-name": "location", "value": "", "type": "geo_point"}
             ],
             "update": [
              { "existing-field-name": "location", "value": "$lat + ',  ' + $lon "}
             ] 
        }
    }
}

Do you think this would work?

ozlerhakan commented 8 years ago

I think so. But we may need to separate the value field in the update document, a field called delimiter can fit much better rather than using regex or splitting value field into several pieces to find the mentioned fields' value ;

"update": [
       { "existing-field-name": "location", "values": ["lat","lon"], delimiter:  ",  " }
 ] 

wdyt?

whollacsek commented 8 years ago

In my opinion, your first proposition would cover broader use cases.

Actually, after some thoughts, my use case requires this manipulation be done based on a condition (lat and lon fields are not always present). Maybe it's best to do this on my end before calling mongolastic. But if I do this transformation on Mongodb, do you know if Elasticsearch will automatically map the location field to geo_point type?

In case Elasticsearch can not detect the geo_point type, is it possible to add a mapping field to the config.yml file? For example:

{
    "elastic": {
        "host": "localhost",
        "port": 9300,
        "mappings": {
            "event": { 
                "data": {
                    "location": {
                        "type": "geo_point"
                    }
                }
            }
        }
    }
}

This partial mapping will be applied when creating the index.

ozlerhakan commented 8 years ago

But if I do this transformation on Mongodb, do you know if Elasticsearch will automatically map the location field to geo_point type?

I haven't tried but it should understand based on the syntax of the field.

In case Elasticsearch can not detect the geo_point type, is it possible to add a mapping field to the config.yml file?

Another handy feature! But for now, you can first create the mapping for your index in ES, than run mongolastic with the dropDataset=false option. so that ES will consider your mapping while adding documents from mongo

whollacsek commented 8 years ago

Ok I'll try it this week, thanks!

ozlerhakan commented 7 years ago

Hi @whollacsek ,

Could you please try 1.4.1 ? I have added a project field so that you can use all the available features of the $project operator of the aggregation framework. You can look at these features over https://docs.mongodb.com/manual/reference/operator/aggregation/project/

whollacsek commented 7 years ago

Hi @ozlerhakan,

I haven't been working on ES related projects lately, I'll give it a try when I have the chance, thanks!

ozlerhakan commented 7 years ago

feel free to add comments later on, I close the issue, thanks!