opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
83 stars 118 forks source link

Fix custom model IT failure on windows #505

Open ylwu-amzn opened 1 year ago

ylwu-amzn commented 1 year ago

CustomModelITTests#testCustomModelWorkflow will fail to clean up local file cache. Test code run successfully, but test framework failed to cleanup temp local files.

at __randomizedtesting.SeedInfo.seed([D97C315DD114CE48]:0)
    at org.apache.lucene.util.IOUtils.rm(IOUtils.java:341)
    at org.apache.lucene.tests.util.TestRuleTemporaryFilesCleanup.afterAlways(TestRuleTemporaryFilesCleanup.java:209)
    at com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.afterAlways(TestRuleAdapter.java:31)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
    at java.base/java.lang.Thread.run(Thread.java:833)
peterzhuamazon commented 2 weeks ago

Similar issues: https://ci.opensearch.org/ci/dbc/integ-test/2.15.0/9978/windows/x64/zip/test-results/8321/integ-test/ml-commons/with-security/stdout.txt


org.opensearch.ml.rest.RestMLDeployModelActionIT > testReDeployModel STANDARD_ERROR
    6 17, 2024 12:53:15 ?? com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
    ??: Suite execution timed out: org.opensearch.ml.rest.RestMLDeployModelActionIT
    ==== jstack at approximately timeout time ====
    "Test worker" ID=1 WAITING on com.carrotsearch.randomizedtesting.RandomizedRunner$2@77c44dd
        at java.base@21.0.2/java.lang.Object.wait0(Native Method)
        - waiting on com.carrotsearch.randomizedtesting.RandomizedRunner$2@77c44dd
        at java.base@21.0.2/java.lang.Object.wait(Object.java:366)
        at java.base@21.0.2/java.lang.Thread.join(Thread.java:2078)
        at java.base@21.0.2/java.lang.Thread.join(Thread.java:2154)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:639)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner.run(RandomizedRunner.java:496)
        at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112)
        at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
        at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
        at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
        at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
        at java.base@21.0.2/java.lang.invoke.LambdaForm$DMH/0x0000021045020000.invokeInterface(LambdaForm$DMH)
        at java.base@21.0.2/java.lang.invoke.LambdaForm$MH/0x00000210450f8c00.invoke(LambdaForm$MH)
        at java.base@21.0.2/java.lang.invoke.Invokers$Holder.invokeExact_MT(Invokers$Holder)
        at java.base@21.0.2/jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(DirectMethodHandleAccessor.java:154)
        at java.base@21.0.2/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base@21.0.2/java.lang.reflect.Method.invoke(Method.java:580)
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
        at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
        at jdk.proxy2/jdk.proxy2.$Proxy5.processTestClass(Unknown Source)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
        at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
        at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:113)
        at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
        at app//worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
        at app//worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)

https://ci.opensearch.org/ci/dbc/integ-test/2.15.0/9978/windows/x64/zip/test-results/8321/integ-test/ml-commons/with-security/local-cluster-logs/id-0/stdout.txt

 if this is unexpected.
[2024-06-17T19:23:24,773][ERROR][o.o.m.e.a.DLModel        ] [node_name_9200] Failed to deploy model MBCnJ5ABBhNLasbtVkzm
java.io.IOException: Unknown I/O error listing contents of directory: C:\Users\ContainerAdministrator\tmpy0eqz467\1\local-test-cluster\opensearch-2.15.0\data\ml_cache\models_cache\models\MBCnJ5ABBhNLasbtVkzm\1\test_model_name
    at org.apache.commons.io.FileUtils.listFiles(FileUtils.java:2190) ~[commons-io-2.15.1.jar:2.15.1]
    at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:334) ~[commons-io-2.15.1.jar:2.15.1]
    at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1188) ~[commons-io-2.15.1.jar:2.15.1]
    at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:266) [opensearch-ml-algorithms-2.15.0.0.jar:?]
    at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) [?:?]
    at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:252) [opensearch-ml-algorithms-2.15.0.0.jar:?]
    at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:142) [opensearch-ml-algorithms-2.15.0.0.jar:?]
    at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) [opensearch-ml-algorithms-2.15.0.0.jar:?]
    at org.opensearch.ml.model.MLModelManager.lambda$deployModel$52(MLModelManager.java:1067) [opensearch-ml-2.15.0.0.jar:2.15.0.0]
    at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.15.0.jar:2.15.0]
    at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$73(MLModelManager.java:1680) [opensearch-ml-2.15.0.0.jar:2.15.0.0]
    at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.15.0.jar:2.15.0]
    at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.15.0.jar:2.15.0]
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) [opensearch-2.15.0.jar:2.15.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.15.0.jar:2.15.0]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
    at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
[2024-06-17T19:23:24,777][ERROR][o.o.m.m.MLModelManager   ] [node_name_9200] Failed to retrieve model MBCnJ5ABBhNLasbtVkzm
org.opensearch.ml.common.exception.MLException: Failed to deploy model MBCnJ5ABBhNLasbtVkzm
    at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:299) ~[?:?]
    at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?]
    at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:252) ~[?:?]
    at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:142) ~[?:?]
    at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) ~[?:?]
    at org.opensearch.ml.model.MLModelManager.lambda$deployModel$52(MLModelManager.java:1067) ~[?:?]
    at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.15.0.jar:2.15.0]
    at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$73(MLModelManager.java:1680) [opensearch-ml-2.15.0.0.jar:2.15.0.0]
    at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.15.0.jar:2.15.0]
    at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.15.0.jar:2.15.0]
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) [opensearch-2.15.0.jar:2.15.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.15.0.jar:2.15.0]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
    at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.io.IOException: Unknown I/O error listing contents of directory: C:\Users\ContainerAdministrator\tmpy0eqz467\1\local-test-cluster\opensearch-2.15.0\data\ml_cache\models_cache\models\MBCnJ5ABBhNLasbtVkzm\1\test_model_name
    at org.apache.commons.io.FileUtils.listFiles(FileUtils.java:2190) ~[?:?]
    at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:334) ~[?:?]
    at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1188) ~[?:?]
    at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:266) ~[?:?]
    ... 14 more