[Mobile] Android/Kotlin/JAVA Multi Threading for Multi models in android app

molo6379 commented 4 months ago

Describe the issue

There are no example about Multi-threading on android device regarding of using multiple models

To reproduce

N/A

Urgency

I really need it ASAP please help :(((

Platform

Android

OS Version

12

ONNX Runtime Installation

Built from Source

Compiler Version (if 'Built from Source')

No response

Package Name (if 'Released Package')

onnxruntime-android

ONNX Runtime Version or Commit ID

com.microsoft.onnxruntime:onnxruntime-android:latest.release

ONNX Runtime API

Java/Kotlin

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

I am trying to use two models on android for inference but wanna make them run parallel.

private suspend fun createOrtSession(): OrtSession? {
    val so = OrtSession.SessionOptions()
    so.use {
        return ortEnv?.createSession(readModel(), so)
    }
}
private suspend fun createOrtSession2(): OrtSession? {
    val so = OrtSession.SessionOptions()
    so.use {
        return ortEnv?.createSession(readModel2(), so)
    }
}

val env = OrtEnvironment.getEnvironment()

env.use {
    val tensor = OnnxTensor.createTensor(env, imgData, shape)
    val tensor2 = OnnxTensor.createTensor(env, imgData2, shape2)

    val startTime = SystemClock.uptimeMillis()

    tensor2.use {
        val timeStart2 = System.currentTimeMillis()
        val output2 = ortSession2?.run(Collections.singletonMap(inputName2, tensor2))
//                Log.i(TAG, "Depth 모델 시간: " + (System.currentTimeMillis() - timeStart2).toString())
        result2.log.add("Depth Model 결과 : " + " " + (System.currentTimeMillis() - timeStart2))
        output2.use {

            val depthArray = output2?.get(0)?.value as Array<Array<FloatArray>>
//                    Log.i(TAG, Arrays.deepToString(depthArray))
        }
    }

    tensor.use {

        val timeStart3 = System.currentTimeMillis()
        val output = ortSession?.run(Collections.singletonMap(inputName, tensor))
//                Log.i(TAG, "Detection 모델 시간: " + (System.currentTimeMillis() - timeStart3).toString())
        result2.log.add("Detection Model 결과 : " + " " + (System.currentTimeMillis() - timeStart3))
        output.use {

Above code is that I made 2 seperate sessions for two different models but now Im running them in serial which is very inefficient.

Can I get an example of how to do Multi-thread on android and run two models in parallel?

tianleiwu commented 4 months ago

Since you have two independent sessions, you can have one thread to call one session to run inference.

For kotlin multiple threading, you can look at examples like https://medium.com/@korhanbircan/multithreading-and-kotlin-ac28eed57fea. For example, you can have two threads.

class DepthModelThread: Thread() {
    public override fun run() {
        ... // depth model inference code here.
    }
}
class DetectionModelThread: Thread() {
    public override fun run() {
        ... // detection model inference code here.
    }
}

For onnxruntime Kotlin/Java example, you can look at this: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/mobile/examples/super_resolution/android/app/src/main/java/ai/onnxruntime/example/superresolution/SuperResPerformer.kt

Craigacp commented 4 months ago

We test multithreaded behaviour here - https://github.com/microsoft/onnxruntime/blob/main/java/src/test/java/ai/onnxruntime/InferenceTest.java#L533

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

microsoft / onnxruntime