opensearch-project / index-management

🗃 Automate periodic data operations, such as deleting indices at a certain age or performing a rollover at a certain size
https://opensearch.org/docs/latest/im-plugin/index/
Apache License 2.0
53 stars 112 forks source link

[FEATURE] Manage CCR follower indices #726

Open laszloszurok opened 1 year ago

laszloszurok commented 1 year ago

Hi!

I would like to ask if it's possible to properly manage CCR follower indices with ISM. Right now my usecase is the following:

I created different policies for the leader and follower indices.

In the policy for the followers there is only a "default" state and a transition to a "delete" state after 3 days. The deletion fails, because CCR makes the followers readonly. From OpenSearch Dashboards index management UI:

{
    "cause": "index [follower-000001] blocked by: [FORBIDDEN/1000/index read-only(cross-cluster-replication)];",
    "message": "Failed to delete index [follower-000001]"
}

(The leader is deleted already.)

Is there a way to resolve this with ISM?

Here is the current policy for the followers:

{
    "policy": {
        "description": "ism policy",
        "default_state": "default",
        "states": [
            {
                "name": "default",
                "actions": [],
                "transitions": [
                    {
                        "state_name": "delete",
                        "conditions": {
                            "min_index_age": "3d"
                        }
                    }
                ]
            },
            {
                "name": "delete",
                "actions": [
                    {
                        "retry": {
                            "count": 1000,
                            "backoff": "constant",
                            "delay": "10m"
                        },
                        "delete": {}
                    }
                ]

            }
        ],
        "ism_template": {
            "index_patterns": ["follower-*"],
            "priority": 1
        }
    }
}

I saw that there is an 'unfollow' action in elasticsearch ILM for this usecase. Is this functionality planned to be implemented in opensearch?

Angie-Zhang commented 1 year ago

@laszloszurok Thanks for raising this feature request! We see several user cases for ISM with CCR. Our team will discuss and plan it.

aggarwalShivani commented 9 months ago

Hi @Angie-Zhang , @bowenlan-amzn Is this feature being planned? To add an unfollow action in ISM for ccr follower indices?

bowenlan-amzn commented 8 months ago

Haven't been planned. Raise attention to @r1walz.

aggarwalShivani commented 8 months ago

Hi @r1walz, @bowenlan-amzn

I'm interested to work on this feature if its still unassigned, albeit I may require some guidance along the way :)

Are there any points around the design that were discussed, as @Angie-Zhang mentioned previously? One ques though - as part of unfollow action on the index, what operations do we want to perform? 1) stop the replication on the index altogether 2) or close the index > pause replication > open the index

bowenlan-amzn commented 8 months ago

Thanks @aggarwalShivani! I am assigning this to you now.

I don't have much knowledge on CCR. After a quick look at doc, seems we can just do option 1 according to this

When you stop replication, the follower index un-follows the leader and becomes a standard index that you can write to. You can’t restart replication after stopping it.

aggarwalShivani commented 8 months ago

Hi, I'm currently facing a challenge while trying to import ccr libraries in ism project. I have raised the issue in discussion forum. Would highly appreciate if I could get some guidance on how this should be achieved. Thanks!

bowenlan-amzn commented 8 months ago

@aggarwalShivani I am trying to engage someone from CCR team to help here, Thanks.

bowenlan-amzn commented 8 months ago

@mohitamg I notice you are the active contributor in CCR repo and recently cut a PR in IM frontend repo. Please help providing some guidance here, thanks!

aggarwalShivani commented 7 months ago

Hi @mohitamg, @monusingh-1 Would request your guidance on two queries around implementation of this feature -

  1. Dependency Importing - As mentioned above, for this feature, we need to import ccr libraries in ism project. What would be the right-most way of defining this dependency in build.gradle? I have tried various ways (as described in my disc-forum query), but they didn't work. In order to unblock myself, temporarily, i have locally placed the ccr jar and imported it to use in code.

    implementation(files("libs/opensearch-cross-cluster-replication-3.0.0.0-SNAPSHOT.jar"))

  2. Actions required to invoke stop-replication - As far as I understood, in TransportStopIndexReplicationAction.kt, the following actions take place in this sequence: i. close the index on follower ii. remove retention lease from the leader cluster iii. execute stop-replication task (StopReplicationTask(request, l)) iv. reopen the index Can you help me understand the need and significance of these steps?

I was able to invoke only StopReplicationTask from my ism code, i.e. was able to only run step iii. of the above sequence. But with this, the replication got stopped successfully and index was now converted into a regular index on the follower-cluster. I was eventually able to execute DELETE API on the same successfully. Is this sufficient or is it necessary to execute the other 3 steps too? Pls explain.

I request you to share views or route to the right point-of-contact who can help here. Thanks!

aggarwalShivani commented 7 months ago

Hi @bowenlan-amzn and other ISM experts, I am facing a problem while invoking the stop-replication action, and I'm hoping you could guide as this is not too CCR specific.

The issue is very similar to an old one reported in elastic few yrs back. CCR defines StopIndexReplicationRequest, and StopIndexReplicationAction classes. I'm trying to execute this action in ISM like this - val response: AcknowledgedResponse = context.client.suspendUntil { execute(StopIndexReplicationAction.INSTANCE, stopIndexReplicationRequest, it) } (on similar lines like other actions in ism - for ex. done in AttemptCreateRollupJobStep.kt.

The error received on executing this policy -> (Its failing to cast into same type of class.) java.lang.ClassCastException: class org.opensearch.replication.action.stop.StopIndexReplicationRequest cannot be cast to class org.opensearch.replication.action.stop.StopIndexReplicationRequest (org.opensearch.replication.action.stop.StopIndexReplicationRequest is in unnamed module of loader java.net.FactoryURLClassLoader


Updated with my analysis - this seems to because StopIndexReplicationRequest is getting loaded in different classloaders. On my cluster, ccr plugin is installed. Also, i have packaged ccr jar as a dependency of the ism plugin - as a result, the ccr class is duplicated in two places and getting loaded differently perhaps.

If i don't package the ccr jar in ism, I won't be able to invoke these classes on runtime java.lang.NoClassDefFoundError: org/opensearch/replication/action/stop/StopIndexReplicationRequest

I have gone through all dev documentation on opensearch-plugins, and nothing is mentioned on this aspect.

Would need guidelines on how to rightly define dependencies across two opensearch plugins, and ensuring one could use the other's classes on runtime too.

Any suggestions on how we could solve this? This is a blocker for me, and would highly appreciate help here.

aggarwalShivani commented 7 months ago

Hi @bowenlan-amzn, In the context of the issue I've explained above, I stumbled upon the common-utils repo and it says -

  1. Shared request/response/action classes used for plugin to plugin transport layer calls.
  2. Any common functionality across OpenSearch plugins could be moved to this.

I think if we move the required request and response (StopIndexReplicationRequest, StopIndexReplicationAction) classes from the CCR plugin to the common utils, and use these in both CCR and ISM plugins, we might be able solve the classloader issue.

Since you are also one of the maintainers of the common-utils repo too, I would like to ask if my understanding is right.

I would genuinely appreciate any help here as I've hit a roadblock and ain't getting any response anywhere :(

bowenlan-amzn commented 6 months ago

@aggarwalShivani sorry, I forget to follow up this. I did some search after you asked the first time but didn't respond... The root cause is a little tricky and I didn't got time to fully understand so hesitate to respond and forget. This time lemme provide some pointers first.

I believe you are on the right path here by looking into common-utils. This PR probably can help you understand the context of this issue.

Specifically, for this StopIndexReplication API, I think it would work if it's refactored like the sendNotification API which is already used in ISM. It's in common-utils here. ISM invoke sendNotification in common-utils, then in notification reposiroty, SendNotificationAction handles that here. Both in send request side and handle request side, the request and response object's type is ActionRequest and ActionResponse respectively, and got "recreateObject" into the sub type, SendNotificationRequest and SendNotificationResponse.

Last, please ping me in OpenSearch slack space if I am not responding here 🙂

aggarwalShivani commented 6 months ago

Hi Opensearch experts,

We are facing a technical challenge in the implementation of this feature. Have been discussing with @bowenlan-amzn and we've been trying various things, with no luck. The query is posted in the discussion forum as well as in the slack space.

Request your help in unblocking this feature. This feature is very critical for us to integrate ISM with CCR and without this, we cannot completely adopt ISM, hence requesting to prioritize this issue.