Open nisgoel-amazon opened 7 months ago
Thanks for opening an issue. We will look in to it when we have bandwidth. CCR is a unique use case hence I would appreciate if your team can contribute the fix as before. We prioritize closing gaps for generic use cases but need your teams support to close specialized use cases. Let us know if you need any help.
Will go ahead and ignore ccr test for deb and rpm after discussion with Nandan Kumar, he will PR.
Hi @nisgoel-amazon is there any progress on making CCR testing on remote cluster for deb and rpm?
Thanks.
@peterzhuamazon This needs an infra side change, we need help from infra team to understand why multi clusters are not coming up on same node in deb and rpm. We had analysed why ccr repo tests are failing on deb and rpm. Can you help us in scoping down the effort for this issue.
Then i think @ankitkala can align someone to pickup the change.
@peterzhuamazon can you confirm on one thing, as of today can we create multi node cluster on same node in deb and rpm? Means ES process running on different ports to form cluster on single node in deb and rpm?
Not unless you significantly / heavily modify the existing deb/rpm package, you cant run multiple instance of that on a single host. You have to run them on multiple hosts, which probably require a cdk to set things up just for CCR on deb/rpm.
If you try to modify the pkg it defeat the purpose of integTest because you are testing something that will not be used by the customer in the same way.
No, its not like that we will defeat the purpose of integ test as we need 2 clusters to run CCR plugin. It doesn't matter whether we are running 2 clusters on different host or we configure 2 clusters on different ports on same host.
We are doing same thing in win and tar distributions too and that is serving our purpose.
Can you suggest how can we setup CDK to run CCR on deb/rpm.
You misunderstand, our current integTest framework is specifically running every test on 1 host, which you cannot do for CCR on deb and rpm.
If you want it to work for CCR, you have to:
The reason I suggest cdk is because of its ease of retrieving separate host IPs so you can do the test remotely. I am still not sure what would be the change to make this happen, as CCR team has more expertise in how CCR test works.
Happy to have more discussion on this via call.
Thanks.
I had a word with @peterzhuamazon on this one, we have multiple ways to fix this problem. Peter suggested to have our own infra via CSK and then make changes in opensearch-build to pass those node ip's to run our remote-test.
Describe the bug
In Cross Cluster Replication plugin remoteIntegTest are failing from 2.12 release onwards. We are getting
java.net.ConnectException: Connection refused
error while running these test at time of release activity. These errors are coming because while running these we create multi clusters to run integration tests. This pre setup of creating cluster is just creating one cluster at a time. We have seen in logs that when 2nd cluster is coming up openseach-build package is removing the previously created cluster. https://build.ci.opensearch.org/blue/rest/organizations/jenkins/pipelines/integ-test/runs/7981/nodes/122/steps/765/log/?start=0In above log we can see after 1st cluster return 200 and before creating 2nd cluster pre remove script in debian distribution remove the 1st cluster. Below are the lines printed in the above log file.
To reproduce
We can replicate this by running this command