opensearch-project / index-management

🗃 Automate periodic data operations, such as deleting indices at a certain age or performing a rollover at a certain size
https://opensearch.org/docs/latest/im-plugin/index/
Apache License 2.0
53 stars 112 forks source link

[FEATURE] Add support for searchable snapshots within index management #808

Open kotwanikunal opened 1 year ago

kotwanikunal commented 1 year ago

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

bugmakerrrrrr commented 1 year ago

@kotwanikunal maybe we can use the aliases api to atomicly replace the original index, which can operate as follows:

sandervandegeijn commented 1 year ago

Would love this as well. Opened an issue at the forums but missed this. Going to script it myself for the time being.

beejaygee commented 9 months ago

Going to script it myself for the time being.

@sandervandegeijn Could you please provide your script or at least advise what logic you used to make this? Should I rotate index daily, snapshot it, delete it, restore back to the same name with remote_snapshot configured? Or would I need to use an alias?

I really am looking forward to this being built in.

sandervandegeijn commented 9 months ago

https://github.com/sandervandegeijn/opensearch-searchable-snapshot-management

sure, it's by no means abstracted enough as a general library, bit it serves our purposes. I should move some configs to command line parameters. But to get an idea it's good enough.

It will gather all indices, everything older than 7 days will be snapshotted (same name as the index), removed and restored as a searchable snapshot index with the same name as the original. The data will be available as is, so everything is searchable through the dashboards visalisations and such as the users were used to with the same index patterns.

After 185 days everything gets removed.

I like the aliases idea btw, this gives the option to make a difference between local en remote indices in their names while maintaining functionality. Script will be less complex as well. I might implement this later this week.

The distinction between both in dashboards / index management is non existent at this time.

beejaygee commented 9 months ago

@sandervandegeijn Thanks for this. Looks good. One question, it looks like one should use this in addition to an ISM policy to perform backups. Can you backup a remote snapshot? Would a backup policy just be backup all indices (including remote) and prefix with backup for their name (to avoid backups being marked as remote snapshots and being picked up by your script)?

I assume you just run this as a cron job every hour or something? Probably needs to be more frequent than daily due to it needing time to complete snapshot before it can restore completed snapshot as remote?

Regarding aliases, would there be any impact to performance? I thought I read a while back that using aliases has an impact to performance.

sandervandegeijn commented 9 months ago

I don't know, we would store that on the same storage so there is no benefit for us to do that. If you want to do it, maybe it's better to save an index to two repos before deleting it.

I'm going to restore the snapshots under a different index name, something like remote-* and create an alias. This thread gave me that idea.

We run it every hour yes, but you can do it less frequent. Index will be there unit the next run at whatever time that's going to happen.

We are using aliases extensively in our other clusters, never had any issues.

spapadop commented 4 months ago

We find this feature quite crucial for enabling extensive use of searchable snapshots, which are otherwise a great way to support "cold storage". It's a pity this one is not implemented yet. We all have to implement our own scripting to automate this process, which can be way more prone to errors. Is there an interest in moving this feature forward?

ccben87 commented 4 months ago

I am very much interested in having this feature implemented.

wntmddus commented 3 weeks ago

What should I do to contribute to this? I think I have implemented the solution