jprante / elasticsearch-knapsack

Knapsack plugin is an import/export tool for Elasticsearch
Apache License 2.0
472 stars 77 forks source link

Unable to do a full import from file (ES 1.3.4) #93

Open AtzeDeVries opened 8 years ago

AtzeDeVries commented 8 years ago

Hi,

I'm trying to export the from my es server (about 22GB, 100K documents, 1 index) to a file. The following situations happen.

I would like to have all the data in a file, since it is portable.

Command to export:

curl -XPOST 'localhost:9200/_export?path=/data/elasticsearch_export/nda_export.bulk.gz'

The are two clusters. cluster A containg 1 node, and cluster B containing 3 nodes. I'm trying to move data from A to B.

download link of plugin is http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-knapsack/${es_version}.0/elasticsearch-knapsack-${es_version}.0-plugin.zip where $es_version is 1.3.4

jprante commented 8 years ago

I forgot to upload 1.3.4.1 in October. Now it's there. Can you try 1.3.4.1 to check if the problems persist? Thanks.

http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-knapsack/1.3.4.1/

AtzeDeVries commented 8 years ago

Hi,

So i've did a lot of testing, but found the solution. The mapping was not transfered (or not correctly transfered) to the new server if you move the data via a file. If i inject the mapping before the _import that it seems to work fine (the export is a bulk.gz of one index).

jprante commented 8 years ago

Yes. The bulk archive is not able to transport mappings. The ES bulk format has no mechanism for creating mappings, only for document indexing.

AtzeDeVries commented 8 years ago

ok, than stil the issue of 'non' bulk exports only begin 2GB is still standing. I did not try to export it to a tar file instead of tar.gz. I did test it to breakup in multipe files, but the total of tar.gz multiple files was 2GB

2016-01-19 15:06 GMT+01:00 Jörg Prante notifications@github.com:

Yes. The bulk archive is not able to transport mappings. The ES bulk format has no mechanism for creating mappings, only for document indexing.

— Reply to this email directly or view it on GitHub https://github.com/jprante/elasticsearch-knapsack/issues/93#issuecomment-172863421 .

jprante commented 8 years ago

Yes, I checked. The fix was not backported.

If you can build form source, here is a quick fix:

Set longFileMode in this line

https://github.com/jprante/elasticsearch-knapsack/blob/1.3/src/main/java/org/xbib/io/archive/tar/TarArchiveOutputStream.java#L84

to LONGFILE_GNU

AtzeDeVries commented 8 years ago

so it is only a problem with tar files? Then could just use .zip which is fine be me. (i can't test at the moment, since the testing server is runnig a different job)/.

jprante commented 8 years ago

Yes, it's a tar format peculiarity, the original tar is limited to 2GB, while POSIX TAR or GNU TAR is not.

AtzeDeVries commented 8 years ago

Ok, then i'll try the zip method tomorrow. I'll report back on that

2016-01-19 15:50 GMT+01:00 Jörg Prante notifications@github.com:

Yes, it's a tar format peculiarity, the original tar is limited to 2GB, while POSIX TAR or GNU TAR is not.

— Reply to this email directly or view it on GitHub https://github.com/jprante/elasticsearch-knapsack/issues/93#issuecomment-172875913 .