opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
83 stars 118 forks source link

[BUG] Flaky IT in ml-commons #2560

Open zane-neo opened 1 week ago

zane-neo commented 1 week ago

What is the bug? Below is a flaky IT which fails time to time, we need to fix this: https://github.com/opensearch-project/ml-commons/actions/runs/9540553655/job/26292459753?pr=2500#step:6:3006

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

What is the expected behavior? A clear and concise description of what you expected to happen.

What is your host/environment?

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

rbhavna commented 1 week ago

Below is the exception we are seeing:


Suite: Test class org.opensearch.ml.tools.VisualizationsToolIT
  2> REPRODUCE WITH: ./gradlew ':opensearch-ml-plugin:integTest' --tests "org.opensearch.ml.tools.VisualizationsToolIT.testVisualizationNotFound" -Dtests.seed=B77447D33F3C293E -Dtests.security.manager=false -Dtests.locale=is-IS -Dtests.timezone=America/Anguilla -Druntime.java=11
  2> org.opensearch.client.ResponseException: method [DELETE], host [http://127.0.0.1:43309/], URI [/_plugins/_ml/models/29kFJJABf_wnGiLB9K81], status line [HTTP/1.1 400 Bad Request]
    {"error":{"root_cause":[{"type":"status_exception","reason":"Model cannot be deleted in deploying or deployed state. Try undeploy model first then delete"}],"type":"status_exception","reason":"Model cannot be deleted in deploying or deployed state. Try undeploy model first then delete"},"status":400}
        at __randomizedtesting.SeedInfo.seed([B77447D33F3C293E:7817E34E380F133E]:0)
        at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:385)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:355)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:330)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:179)

        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:152)
        at app//org.opensearch.ml.utils.TestHelper.makeRequest(TestHelper.java:141)
        at app//org.opensearch.ml.rest.MLCommonsRestTestCase.deleteModel(MLCommonsRestTestCase.java:653)
        at app//org.opensearch.ml.tools.ToolIntegrationWithLLMTest.deleteModel(ToolIntegrationWithLLMTest.java:75)
        at java.base@11.0.23/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base@11.0.23/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base@11.0.23/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base@11.0.23/java.lang.reflect.Method.invoke(Method.java:566)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
        at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestR
Hailong-am commented 1 week ago

I am able to reproduce with multi nodes test, @b4sjoo did a fix with https://github.com/opensearch-project/ml-commons/pull/2510, but the issue still exists

./gradlew ':opensearch-ml-plugin:integTest' --tests "org.opensearch.ml.tools.VisualizationsToolIT" -Dtests.security.manager=false -Druntime.java=11 -PnumNodes=5
ylwu-amzn commented 1 week ago

@Hailong-am Thanks, can you help fix ?