Closed andriuwe4ka closed 11 years ago
so in addition creating index: curl -XPUT "localhost:9200/aaa" -d '{"settings":{"number_of_shards":1,"mapper":{"dynamic":false}},"mappings":{"main":{"properties":{"title":{"type":"string"}}}}}' creating river: curl -XPUT "localhost:9200/_river/aaa/_meta" -d '{"type":"mongodb","mongodb":{"db":"aaa_db","collection":"aaa_collection"},"index":{"name":"aaa","type":"main"}}'
after this mapping is what i want: only title field, but after adding any record to aaa_collection (river using oplog ads it to ES) mapping is already with all fields =(
Hi,
Index and mapping can be created before the river. Does you custom mapping include all fields? Can you please provide a gist to reproduce your issue?
Thanks, Richard.
i'll ask about providing, but i think it's impossible =)
and to reproduce:
mongo collection: {"title":string, "description":string, "timestamp":long ...and as many as you wish} i need to index only title and description (to begin with - title) so (no indexes yet)
curl -XPUT "localhost:9200/aaa" -d '{"settings":{"number_of_shards":1,"mapper":{"dynamic":false}},"mappings":{"main":{"properties":{"title":{"type":"string"}}}}}'
then river time
curl -XPUT "localhost:9200/_river/aaa/_meta" -d '{"type":"mongodb","mongodb":{"db":"aaa_db","collection":"aaa_collection"},"index":{"name":"aaa","type":"main"}}'
if i get mapping: curl -XGET "localhost:9200/aaa/main/_mapping?pretty=true"
it'll show: { "main" : { "properties" : { "title" : { "type" : "string" } } } }
then i add something to collection (all fields are not empty)
and index will use all of them and mapping will change to track all of fields
Hi,
One of my question was "Does your custom mapping include all fields?"
So in you scenario we will need to remove the unwanted attributes using a script.
Look at the first example in "Script Filters" section [1].
[1] - https://github.com/richardwilly98/elasticsearch-river-mongodb
Thanks, Richard.
understood, thanks a lot :+1: will try now =)
ok, script working and in mapping only those fields that i need =) but source in ES is smaller too (all fields are deleted - nice)
so what to do: get only id or is there some command (like delete or ignore) to store fields in ES but not indexing them?
Look at custom mapping in Elasticsearch [1].
[1] - http://www.elasticsearch.org/guide/reference/mapping/source-field/
yes, i saw it, but i mentioned not this: as in script appears "delete ctx.document.timestamp;" this field is no more available in _source =(
Sorry but I am not sure to understand your issue.
Thanks, Richard.
Hi
I am experimenting a similar problem. I have a field that I don't want to be analyzed so I first create the index with the mapping information
PUT http://localhost:9200/users
{
"mappings" : {
"default" : {
"properties" : {
"nickname" : { "type" : "string", "index" : "not_analyzed" }
}
}
}
}
And then I create the river
PUT http://localhost:9200/_river/users/_meta
{
"type": "mongodb",
"mongodb": {
"servers": [
{
"host": "127.0.0.1",
"port": 27017
}
],
"options": {
"secondary_read_preference": true,
"drop_collection": true
},
"db": "users",
"collection": "userApplication"
},
"index": {
"name": "users",
"type": "default"
}
}
I am loosing the mapping configuration so the nickname field is analyzed and search results are not the desired ones
Hi,
Try to disabled dynamic mapping [1]. See example here [2].
Please let me know how it goes.
[1] - http://www.elasticsearch.org/guide/reference/mapping/dynamic-mapping/ [2] - https://gist.github.com/radu-gheorghe/4737210
Thanks, Richard.
Hello again,
I am loosing any kind of configuration of the index and the mappings when creating the river. Now I create the index like this
PUT http://localhost:9200/users
{"settings" : {
"mapper" : {
"dynamic": false
}
},
"mappings": {
"default": {
"properties": {
"_class": {
"type": "string"
},
"applicationIdentifier": {
"type": "string"
},
"creationDate": {
"type": "date",
"format": "dateOptionalTime"
},
"nickname": {
"type": "string", "index":"not_analyzed"
},
"password": {
"type": "string"
},
"userCustomInformation": {
"type": "object"
}
}
}
}
}
For make sure that the index is correct created I ask for the index settings
GET http://localhost:9200/users/_settings
{
"users": {
"settings": {
"index.number_of_shards": "5",
"index.number_of_replicas": "1",
"index.version.created": "900099",
"index.mapper.dynamic": "false"
}
}
}
And after creating the river like in my last comment, the index configuration is lost
{
"users": {
"settings": {
"index.number_of_shards": "5",
"index.number_of_replicas": "1",
"index.version.created": "900099"
}
}
}
I've tried to close the index, disable the dynamic mapping and reopening it but I have the same issue. The mapping configuration is also lost.
Hi,
In the scenario above index settings return:
curl -XGET "http://localhost:9200/index75/_settings?pretty=true"
{
"index75" : {
"settings" : {
"index.mapper.dynamic" : "false",
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1",
"index.version.created" : "900099"
}
}
}
Any update?
@andriuwe4ka I will close this issue due to inactivity. Please reopen it if needed.
Hi @richardwilly98 , I am facing a similar problem, wherein I want to index only a subset of the fields and store the entire json using
{_source : { enabled : true}}
So I create my mappings only for the field I want to index.
{"mappings": {
"facebook" : {
"dynamic" : "strict",
"properties" : {
"post_id" : {
"type" : "string",
"store" : True,
"index" : "not_analyzed",
},
"text" : {
"type" : "string",
"store" : True,
"index" : "analyzed",},
"message" : {
"type" : "string",
"store" : True,
"index" : "analyzed",},
"brand_id" : {"type" : "integer"},
},
},
},
}
And now when I try to create the river with the following config
payload = {"type" : "mongodb",
"mongodb": {
"db" : db,
"collection" : collection,
"secondary_read_preference" : True,
},
"index" : {
"name" : index_name,
"type" : doc_type,
},
}
Following is the StackTrace I get while trying to create a river with above config
org.elasticsearch.index.mapper.StrictDynamicMappingException: mapping set to strict, dynamic introduction of [fb_post_type] within [facebook] is not allowed
at org.elasticsearch.index.mapper.object.ObjectMapper.parseDynamicValue(ObjectMapper.java:628)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:618)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:469)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:515)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:462)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:392)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:394)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:153)
I tried using include_fields as suggested however, that leads to rest of the fields, i.e the fields I don't have mappings for not be stored as a part of _source
And also going by applyAdvancedTransformation, the document with the transformation applied is sent to updateBulkRequest, so naturally the fields would be lost here too and can't be set in the _source, even if I delete the unnecessary fields using the script filter
It would be great if you can suggest how I can get mongodb-river to work wherein I can only index the the fields, I have mapping for and store the remaining fields.
Update:
Using {dynamic : false}
in my mapping worked. Here is the final mapping I'm using.
body = {"mappings": {
"facebook" : {
"dynamic" : False,
"properties" : {
"post_id" : {
"type" : "string",
"store" : True,
"index" : "not_analyzed",
},
"text" : {
"type" : "string",
"store" : True,
"index" : "analyzed",},
"message" : {
"type" : "string",
"store" : True,
"index" : "analyzed",},
"brand_id" : {"type" : "integer"},
},
},
},
}
I met the same problem after I upgraded to mongodb-river 2.0.0. It is really frustrating!
After trying all tricks mentioned in this thread, it doesn't work any way.
Now, I just switched back to version 1.6.8 which goes well with MongoDB at least.
@mocheng can you provide your configuration (river, mapping index)?
@richardwilly98 My deployment version is: MongoDB 2.4.3 ElasticSearch 1.0.0 ElasticSearch-MongoDB river 2.0.0
The index mapping is created as:
curl -XPUT http://192.168.100.92:9200/beeper_v1 -d '
{
"mappings": {
"register": {
"dynamic" : false,
"properties": {
"score": {
"type": "integer"
},
"online": {
"type": "boolean"
},
"title": {
"type":"string",
"indexAnalyzer":"ik",
"searchAnalyzer":"ik"
},
"intro": {
"type":"string",
"indexAnalyzer":"ik",
"searchAnalyzer":"ik"
},
"area": {
"type":"string",
"indexAnalyzer":"ik",
"searchAnalyzer":"ik"
},
"loc" : {
"type" : "geo_point"
}
}
}
}
}
'
The river is created as
curl -XPUT "192.168.100.92:9200/_river/beeper_river/_meta" -d '
{
"type": "mongodb",
"mongodb": {
"servers": [
{ "host": "192.168.100.99", "port": 30000 }
],
"options": { "secondary_read_preference": true },
"db": "beeper",
"collection": "register"
},
"index": {
"name": "beeper",
"type": "register"
}
}
'
The MongoDB collection register has document like below:
{
"_id" : ObjectId("52d7a72dcaff4848e11200f5"),
"av" : "100060000",
"basic" : {
"tel" : "13261805201",
"pwd" : "111111",
"pts" : 1389864749
},
"crc" : 1,
"domain_name" : "6564491425",
"hpts" : ISODate("2014-04-09T06:02:03.983Z"),
"loc" : [
"4.9E-324",
"4.9E-324"
],
"login" : true,
"mid" : 2000000010,
"model" : "GT-I9100",
"oc" : 2,
"ocre" : 1,
"online" : false,
"os" : "Android4.1.2",
"personal" : {
"name" : "Zhuzhu",
"idcard" : "371421198609111760",
"pts" : 1389864749
},
"score" : 1449332708.7142856,
"src" : 2,
"tpts" : 1390187366,
"unread" : 24,
"ur" : 4.428571428571429,
"urc" : 7,
"work" : {
"area" : "Sheji",
"intro" : "婚庆abc Xjjhvdfhnmndsshjnvzstjknbcsfjjjj",
"pts" : 1389864749,
"title" : "婚庆"
}
}
After the river is created, the index mapping is changed to have all fields. Unfortunately, the IK analyzer is changed to default string analyzer.
Did you try without index alias?
@richardwilly98 It works!!!
Originally, the "beeper" is alias of "beeper_v1". After changing the "db" to "beeper_v1" from "beeper", it works!
Thank you so much!!!
curl -XPUT "192.168.100.92:9200/_river/beeper_river/_meta" -d '
{
"type": "mongodb",
"mongodb": {
"servers": [
{ "host": "192.168.100.99", "port": 30000 }
],
"options": { "secondary_read_preference": true },
"db": "beeper_v1",
"collection": "register"
},
"index": {
"name": "beeper",
"type": "register"
}
}
'
is there a way to use my own mapping, not default? for example: i have collection with many fields in document and i really need only two of them in index. so i suppose that indexing two fields will take less time than 15-20 fields.
i found forked version of your river (https://github.com/gustavonalle/elasticsearch-river-mongodb), but i have small interest in forked version because of support, etc.
so is there a way to write my own mapping for river?
i found issue #64 here, with howto (creating index in elasticsearch first and creating river after) but this appear not working because mapping shows all fields after river creation =(
so any comments?