ozlerhakan / mongolastic

:traffic_light: A dataset migration tool from MongoDB to Elasticsearch and vice versa.
MIT License
136 stars 34 forks source link

date type should be date instead of long #19

Closed whollacsek closed 8 years ago

whollacsek commented 8 years ago

I'm trying to use data imported by mongolastic with kibana and I just found out the type mapping for date type is wrong. According to https://github.com/elastic/kibana/issues/3347 it should be date, not long.

Is it possible to configure it in the config file?

ozlerhakan commented 8 years ago

Hi @whollacsek ,

I would like to help you to make this feature available but first I need sample data to investigate it. Could you share example documents with me? I have not used Kibana. which steps should I follow to get what you really say?

whollacsek commented 8 years ago

Hi @ozlerhakan here are some steps to explain what I meant:

  1. Create a mongo database with some data
mongo --eval '
db.events.insert(
[{
    "createdAt" : ISODate("2016-06-03T16:25:39.542Z"),
    "timestamp" : ISODate("2016-06-03T16:25:03.773Z"),
    "value" : "1"
},{
    "createdAt" : ISODate("2016-06-04T16:25:39.542Z"),
    "timestamp" : ISODate("2016-06-04T16:25:03.773Z"),
    "value" : "2"
},{
    "createdAt" : ISODate("2016-06-05T16:25:39.542Z"),
    "timestamp" : ISODate("2016-06-05T16:25:03.773Z"),
    "value" : "3"
}]
)' mongolastic19
  1. Prepare config.yaml
misc:
   dindex:
       name: mongolastic19
       as: mongolastic19
   ctype:
       name: events
       as: event
   batch: 200
   dropDataset: true
mongo:
   host: localhost
   port: 27017
elastic:
   host: localhost
   port: 9300
  1. Run java -jar mongolastic.jar -f config.yaml
  2. Inspect elasticsearch index
curl -XGET http://localhost:9200/mongolastic19

Result:

{
    "mongolastic19": {
        "aliases": {},
        "mappings": {
            "event": {
                "properties": {
                    "createdAt": {
                        "properties": {
                            "$date": {
                                "type": "long"
                            }
                        }
                    },
                    "timestamp": {
                        "properties": {
                            "$date": {
                                "type": "long"
                            }
                        }
                    },
                    "value": {
                        "type": "string"
                    }
                }
            }
        },
        "settings": {
            "index": {
                "creation_date": "1477584818718",
                "number_of_shards": "5",
                "number_of_replicas": "1",
                "uuid": "6ft-7IIUSziWrYBzibmOog",
                "version": {
                    "created": "2030399"
                }
            }
        },
        "warmers": {}
    }
}

So we got :

"timestamp": {
    "properties": {
        "$date": {
            "type": "long"
        }
    }
}

Instead of (maybe):

"timestamp": {
    "properties": {
        "$date": {
            "type": "date"
        }
    }
}

Basically Kibana needs a field of type date to do compute charts, but I do not know if the type date should be specified like above.

ozlerhakan commented 8 years ago

Ah I see, you can use dateFormat: "<format>' with mongolastic.

You can give any format that elasticsearch supports.

For example;

config.yaml

misc:
   dindex:
       name: mongolastic19
       as: mongolastic19
   ctype:
       name: events
       as: event
   batch: 200
   dropDataset: true
mongo:
   host: localhost
   port: 27017
elastic:
   host: localhost
   port: 9300
   dateFormat: "yyyy-MM-dd"

After running mongolastic, you should see the mapping of the newly created index as follows:

{
    "mongolastic19": {
        "mappings": {
            "event": {
                "properties": {
                    "createdAt": {
                        "format": "strict_date_optional_time||epoch_millis",
                        "type": "date"
                    },
                    "timestamp": {
                        "format": "strict_date_optional_time||epoch_millis",
                        "type": "date"
                    },
                    "value": {
                        "type": "string"
                    }
                }
            }
        }
    }
}
whollacsek commented 8 years ago

I tried dateFormat: "yyyy-MM-dd'T'HH:mm:ss.SSSZZ" and dateFormat: "yyyy-MM-dd'T'HH:mm:ssZZ" but each time I get this exception:

663 [main] DEBUG org.elasticsearch.common.netty  - using gathering [true]
697 [main] DEBUG org.elasticsearch.client.transport  - [Rainbow] node_sampler_interval[5s]
721 [main] DEBUG org.elasticsearch.netty.channel.socket.nio.SelectorUtil  - Using select timeout of 500
721 [main] DEBUG org.elasticsearch.netty.channel.socket.nio.SelectorUtil  - Epoll-bug workaround enabled = false
776 [main] DEBUG org.elasticsearch.client.transport  - [Rainbow] adding address [{#transport#-1}{127.0.0.1}{127.0.0.1:9300}]
802 [elasticsearch[Rainbow][management][T#1]] DEBUG org.elasticsearch.shield.transport.netty  - [Rainbow] connected to node [{#transport#-1}{127.0.0.1}{127.0.0.1:9300}]
865 [main] DEBUG org.elasticsearch.shield.transport.netty  - [Rainbow] connected to node [{Dazzler}{k3hG133gSeqVwR1CstjWHA}{172.17.0.2}{172.17.0.2:9300}]
917 [main] INFO org.mongodb.driver.cluster  - Cluster created with settings {hosts=[localhost:27017], mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='5000 ms', maxWaitQueueSize=500}
917 [main] INFO org.mongodb.driver.cluster  - Adding discovered server localhost:27017 to client view of cluster
973 [main] DEBUG org.mongodb.driver.cluster  - Updating cluster description to  {type=UNKNOWN, servers=[{address=localhost:27017, type=UNKNOWN, state=CONNECTING}]
990 [main] INFO com.kodcu.main.Mongolastic  - Load duration: 988ms
Exception in thread "main" org.bson.codecs.configuration.CodecConfigurationException: Can't find a codec for class com.kodcu.util.codecs.CustomDateCodec.
    at org.bson.codecs.configuration.CodecCache.getOrThrow(CodecCache.java:46)
    at org.bson.codecs.configuration.ProvidersCodecRegistry.get(ProvidersCodecRegistry.java:63)
    at org.bson.codecs.configuration.ProvidersCodecRegistry.get(ProvidersCodecRegistry.java:37)
    at org.bson.codecs.BsonTypeCodecMap.<init>(BsonTypeCodecMap.java:44)
    at org.bson.codecs.DocumentCodec.<init>(DocumentCodec.java:86)
    at org.bson.codecs.DocumentCodec.<init>(DocumentCodec.java:72)
    at com.kodcu.service.ElasticBulkService.getEncoder(ElasticBulkService.java:134)
    at com.kodcu.service.ElasticBulkService.<init>(ElasticBulkService.java:57)
    at com.kodcu.main.Mongolastic.initializeBulkService(Mongolastic.java:93)
    at com.kodcu.main.Mongolastic.proceedService(Mongolastic.java:73)
    at java.util.Optional.ifPresent(Optional.java:159)
    at com.kodcu.main.Mongolastic.start(Mongolastic.java:64)
    at com.kodcu.main.Mongolastic.main(Mongolastic.java:37)

Shouldn't mongolastic use the default date format of mongo?

ozlerhakan commented 8 years ago

I have encountered the same error yesterday, so did you try the latest mongolastic jar file?

whollacsek commented 8 years ago

Oh didn't see there's a new release. It's importing now, I'll keep you updated :)

ozlerhakan commented 8 years ago

Hi @winder ,

After changing the version to 3.3.0 mongodb java driver I get the same error like above:

Exception in thread "main" org.bson.codecs.configuration.CodecConfigurationException: Can't find a codec for class com.kodcu.util.codecs.CustomDateCodec.

After switching back to the old version ( mongo java driver 3.2.0), the app works correctly.

Since you add the feature to the app, I would like to ask you whether you have encountered the same issue with 3.3.0 before or not

ozlerhakan commented 8 years ago

great @whollacsek :)

whollacsek commented 8 years ago

@ozlerhakan it works! thanks alot you can close now :)

ozlerhakan commented 8 years ago

🎉 you're welcome :)