jprante / elasticsearch-knapsack

Knapsack plugin is an import/export tool for Elasticsearch
Apache License 2.0
472 stars 77 forks source link

List exports in progress #4

Closed magnusp closed 10 years ago

magnusp commented 11 years ago

It would be fantastic if there was a way to query for exports in progress. Im rather new to ElasticSearch but I think there might be atleast two options:

  1. GET on _export returns if an export is running on that type/index/_all (maybe with a running total?).
  2. POST on _export leads to metadata being set on the index. This could be a timestamp for the most recent export and an additional value if the export is running, finished or aborted.
jprante commented 11 years ago

Thanks for the cool idea. Yes, a json-encoded state report would be helpful. I will pick it up.

hmalphettes commented 11 years ago

Thanks very much for this neat plugin.

I am also interested in tracking the progress of the exports and imports. In the mean time our branch of knapsack support synchroneous execution. It serves our use-case very well: less than 20k documents, lots of clones to execute one after the other.

Let me know if you are interested by this: I can beef up the tests and make a PR: https://github.com/sutoiku/elasticsearch-knapsack/commit/81a48df2f0764152bbb9ff317d88b69537aeb32c

Otherwise next time the need arise we will give a shot at the stateful operations in a different branch without this.

btiernay commented 11 years ago

@jprante I'd be curious to know how you intend to handle the state based nature of this request. Maybe I can lend a hand.

jprante commented 11 years ago

The plugin could write to the cluster state, and maintain a small temporary queue of exports/imports there. Patches welcome!

btiernay commented 11 years ago

Were you thinking something like:

http://stackoverflow.com/questions/15824582/any-way-to-access-transient-cluster-settings-state-in-an-elasticsearch-plugin

or something more like invoking clusterService.submitStateUpdateTask

jprante commented 11 years ago

Yes, I think the ClusterDynamicSettingsModule is the way to go, similar to Igor Motov's example, something like

public class MyPlugin extends AbstractPlugin {

    /* ... */

    public void onModule(ClusterDynamicSettingsModule module) {
        module.addDynamicSettings("plugin.knapsack.export.queue");
        module.addDynamicSettings("plugin.knapsack.import.queue");
    }
}

followed by cluster update and state lookup actions in knapsack's REST import / export / (state) actions to maintain the queue. If the settings were not declared in the onModule() method, they would be rejected by the validator.

btiernay commented 11 years ago

Okay, that seems pretty straightforward. The only other question I have is what to store in the setting. My understanding is that these are typically simple values such as int, Time, and String. Would we just store the concatenated paths of the imports and exports, delimited by some character in queue order?

btiernay commented 11 years ago

Additionally, I would assume that these would be transient settings as to not persist across restarts, correct?

jprante commented 11 years ago

Yes, they are transient settings, because export/import will stop when the node stops. The data to store should be start time, index / type spec, and target path. An advanced feature would be storing the number of performed scan/scroll or bulk index actions, with the time stamp of the last activity, and a last error message if available.

btiernay commented 11 years ago

Ok, I'm going to try working on this today... I hope :)

A couple of remaining questions:

jprante commented 11 years ago

For using GET, a suitable REST endpoint would be useful (and currently I use POST which does not require idempotency). I would suggest new endpoints like _export/state/ and _import/state/ I haven't tried with 0.90.1 yet, only 0.90.2

btiernay commented 11 years ago

@jprante Just wanted to let you know that I started working on this. I should have something to commit this weekend. Cheers.

btiernay commented 11 years ago

This has been addressed in #18. Note that is does not address the "advanced features" mentioned above.