jprante / elasticsearch-knapsack

Knapsack plugin is an import/export tool for Elasticsearch
Apache License 2.0
472 stars 77 forks source link

Ignore directory entries in tar files #53

Closed rtkmhart closed 10 years ago

rtkmhart commented 10 years ago

If an exported tar file is pulled apart and put back together using the regular 'tar' tools, the tar file will include directory entries. The knapsack plugin doesn't handle that gracefully and blows up with something like

org.elasticsearch.action.ActionRequestValidationException: Validation Failed: 1: type is missing;2: type is missing;3: type is missing;

The fix is to ignore directory entries, or packets with zero length payloads.

Some more background, a tar file made by the knapsack export might look like (viewed with tar tzvf ...

-rw-r--r-- vagrant/vagrant 554 2014-09-30 17:15 infralogs-2014.09.28/sshd/zW1Nhc5NSm-7Kotn8Xw05g/_source

And once that tar file has been extracted using tar xzvf ... and recreated using tar czf ... it will look like:

drwxrwxr-x vagrant/vagrant   0 2014-09-30 19:40 infralogs-2014.09.28/sshd/
drwxrwxr-x vagrant/vagrant   0 2014-09-30 19:40 infralogs-2014.09.28/sshd/zW1Nhc5NSm-7Kotn8Xw05g/
-rw-r--r-- vagrant/vagrant 554 2014-09-30 17:15 infralogs-2014.09.28/sshd/zW1Nhc5NSm-7Kotn8Xw05g/_source

but the import assumes that every entry in the tarfile is a valid json object.

jprante commented 10 years ago

I agree this is an issue. But unfortunately I depend on empty payloads: if an index has an alias name, I transport this with an empty payload.

rtkmhart commented 10 years ago

I don't know how else to fix it without getting into the commons library. If you're open to that, the other possibility for a fix is to put something in org.xbib.io.archive.ArchiveSession.read() to check if the ArchiveEntry.isDirectory() is true and then ignore it. But that means read() returns null and performImport() will drop out of the while loop. So i'm at a loss as to how to really fix it. Thoughts?

jprante commented 10 years ago

I will add extra check for sanity here, after the index alias transport has been checked https://github.com/jprante/elasticsearch-knapsack/blob/master/src/main/java/org/xbib/elasticsearch/action/knapsack/imp/TransportKnapsackImportAction.java#L222

rtkmhart commented 10 years ago

Awesome, that is much better. Thanks for the quick response! I will close this PR.