kuzzleio / kuzzle

Open-source Back-end, self-hostable & ready to use - Real-time, storage, advanced search - Web, Apps, Mobile, IoT -
https://kuzzle.io
Apache License 2.0
1.43k stars 123 forks source link

As a user I want to copy documents from a source to a destination collection with automated reindexing. #2030

Closed aristsakpinis93 closed 1 year ago

aristsakpinis93 commented 3 years ago

Feature Description & Example Use Case

Collection mappings cannot be modified. In reality, as requirements change and new ones arise so do mappings. This has already been adressed in Trello. As implementing a full blown mapping migration system might be overshooting the mark here, I'd like to suggest a feature making it possible to accomplish this by utilizing already existing Kuzzle & Elasticsearch functionality.

Kuzzle uses Elasticsearch indices to implement its internal index and collection structure ({index}/{collection} --> &{index}.{collection}. As Elasticsearch index mappings are meant to be immutable after document indexing, the above explained behaviour makes sense. Nevertheless, there is a way to update collection mappings by using a work around:

  1. Create new Kuzzle collection (e.g. collection_v2) via Kuzzle API /SDKs with updated mappings
  2. Use Elasticsearch Reindex API to copy documents to new Kuzzle collection (which under the hood is an Elasticsearch index)

The 2nd step cannot be performed by the Kuzzle API yet. As a Kuzzle user (developer / operator) I want to perform both steps against a single service. This enhances usability, security (as Elasticsearch might not even be meant to be manually accessible while being plugged in and administrated by Kuzzle) and consistency (Kuzzle resources on Elasticsearch administrated by Kuzzle only).

Possible Solution

This can be achieved by implementing a wrapper around the native Elasticsearch Reindex API and exposing it as additional feature in the Kuzzle API.

curl -X POST "{KUZZLE_URL}:{KUZZLE_PORT}/{index}/{collection}/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "collection": "my-collection-1"
  },
  "dest": {
    "collection": "my-collection-2"
  }
}
'
Aschen commented 3 years ago

Hi @aristsakpinis93,

It's a very good idea, we will talk about it in our internal product workshop.

In the meantime, you can use the Integrated Elasticsearch Client to securely access ES from the backend (from a custom controller action for example).