miku / esbulk

Bulk indexing command line tool for elasticsearch.
GNU General Public License v3.0
278 stars 41 forks source link

Is it possible to identify certain field as document id #3

Closed RobGThai closed 8 years ago

RobGThai commented 8 years ago

I'm using Elasticsearch to store data from MySql. I wanted to keep row ID in MySql and ES's Document ID, identical. Is it possible to do so with ESBulk?

miku commented 8 years ago

Thanks for the report, it seems useful to allow reuse of IDs. I added support for it in 0.3.8, via an command line flag, that allow the ID field to be specified. See README for details.

RobGThai commented 8 years ago

That's cool. Thanks. With this it will updating document with the same ID instead of adding it, right?

miku commented 8 years ago

Yes, exactly, it is standard ES behavior. Under the hood, if the -id flag is set to some value, esbulk will peek into the document at hand and will extract the value of the specified field and will use put it in the header for the bulk indexing as _id.

There is one, kind of, edge case: If the id field of your documents is actually named _id elasticsearch will complain. esbulk handles this by automatically removing the _id from the document - but this should be a rare case and even when this happens, it should do the right thing.

Let me know, if you encounter any bugs. If not then I would close this issue soon. Thanks!

RobGThai commented 8 years ago

I'm getting json: cannot unmarshal number into Go value of type string

This is probably my document id is store in number format inside json.

miku commented 8 years ago

Thanks for testing! I see the problem and hopefully fixed it with v0.3.9. Now string and numeric ids are allowed.

RobGThai commented 8 years ago

Yes, it works fine now thanks a lot :)