How can I implement the YOLOv8 TFLite model with an output shape of [1, 9, 8400] in Android?

ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

https://docs.ultralytics.com

GNU Affero General Public License v3.0

24.46k stars 4.86k forks source link

How can I implement the YOLOv8 TFLite model with an output shape of [1, 9, 8400] in Android? #2950

Closed MuhammadSibtain5099 closed 10 months ago

MuhammadSibtain5099 commented 1 year ago

Search before asking

[X] I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

I'm currently working on an Android project that requires object detection using the YOLOv8 model. I have converted the model to TFLite format successfully, but I'm facing difficulties in handling the output shape. The model outputs a tensor with the shape [1, 9, 8400], which represents the bounding box predictions for multiple objects.

I would greatly appreciate any guidance on how to interpret and process this output shape in my Android application. Specifically, I need assistance with understanding the structure of the output tensor and extracting the bounding box coordinates and class probabilities for each detected object.

Any code snippets, suggestions, or resources related to implementing YOLOv8 TFLite with this specific output shape in Android would be immensely helpful. Thank you in advance for your assistance!

Additional

No response

glenn-jocher commented 1 year ago

@MuhammadSibtain5099 thank you for your question and interest in YOLOv8 TFLite model. The output shape of the model [1, 9, 8400] indicates that the model has 9 predictions for each of the 8400 grid cells in the image, where each prediction contains information about the bounding box coordinates and class probabilities for a predicted object. To extract the bounding box coordinates and class probabilities for each detected object in your Android application, you will need to decode these predictions using post-processing techniques such as Non-Maximum Suppression (NMS) and thresholding.

I would recommend researching and implementing NMS to filter out redundant bounding box predictions and selecting those with highest class probabilities. There are many code examples and resources available online for implementing NMS and object detection with YOLOv8 TFLite in Android. Good luck with your project, and please let me know if you have any further questions or require additional assistance.

Gornoka commented 11 months ago

@glenn-jocher Is there any official documentation on the output tensors for the different tasks? I have failed to find something like that in the past.

glenn-jocher commented 11 months ago

@Gornoka hi there! Thank you for your question. Currently, YOLOv8 does not have official documentation specifically detailing the output tensors for different tasks. However, the output tensor structure is consistent across most YOLO models.

For object detection tasks, the output tensor typically includes information about the predicted bounding box coordinates, class probabilities, and confidence scores for each detected object. These values can be extracted and processed to draw bounding boxes around objects and determine their corresponding classes.

To obtain more specific details about the output tensor structure and how to interpret it, I recommend checking the code and documentation within the Ultralytics YOLOv8 repository. You can explore the relevant code files and comments, which provide insights into the output tensor structure and its interpretation.

If you have any further questions or need additional assistance, please let me know. I'm here to help!

surendramaran commented 10 months ago

My YoloV8 model had an output shape of [1,5,8400]. Which I interpret successfully https://stackoverflow.com/questions/76575124/get-bounding-box-the-confidence-score-and-class-labels-from-yolov8-onnx-model

In the meantime, can you show me how your tflite interpreter file looks like in android? I am having difficulty handling it @Gornoka

Gornoka commented 10 months ago

My YoloV8 model had an output shape of [1,5,8400]. Which I interpret successfully https://stackoverflow.com/questions/76575124/get-bounding-box-the-confidence-score-and-class-labels-from-yolov8-onnx-model

In the meantime, can you show me how your tflite interpreter file looks like in android? I am having difficulty handling it @Gornoka

Sorry, I don't work on Android apps and just hijacked this discussion to look for some info that would also apply to the TFlite environment.

glenn-jocher commented 10 months ago

@surendramaran hi there!

Thank you for sharing your success in interpreting the output shape [1, 5, 8400] of your YOLOv8 model. I'm glad to hear that the Stack Overflow post was helpful in understanding how to extract the bounding box coordinates, confidence scores, and class labels!

Regarding your question about the TFLite interpreter file in Android, I apologize for the confusion, but I'm focused on YOLOv8 and don't have expertise in Android app development or TFLite implementation specifics. However, there are various online resources, tutorials, and documentation available that can guide you through implementing the TFLite interpreter in an Android application. Searching for TFLite-specific examples and tutorials should provide you with the necessary guidance and code snippets to handle the interpreter in Android.

If you have any further questions related to YOLOv8 or the model's output, I'd be more than happy to assist you. Good luck with your Android app development, and I hope you find the resources you need to handle the TFLite interpreter effectively!

Best regards, Glenn Jocher (Ultralytics Team)

MuhammadSibtain5099 commented 10 months ago

@glenn-jocher Thank you for you response. There are many developer who have same problem about deployment of yolov8 TFLite model into android application. Actually, it is not all about to understanding android app development and TFLite implementation. There was a sample code for yolov5 about deployment of TFLite model in android application which helped a lot. We are trying same thing that was done in yolov5 and just changing its input according to shape of yolov8 but it is not working. and there is official application android of ultralystics that are using yolov8, its mean one of your developer have already done it. why not your android expert just share a sample project. so we can easily deploy yolov8 in our android application. Kindly, put extra effort for us to fix our issue. we are trying to solve it from first day of yolov8.

Best regards, Muhammad Sibtain Abbas

glenn-jocher commented 10 months ago

@Gornoka dear Muhammad Sibtain Abbas,

Thank you for reaching out and expressing your concerns regarding the deployment of the YOLOv8 TFLite model in an Android application. We understand that many developers are facing similar challenges, and we appreciate your feedback.

While we acknowledge that there was a sample code provided for YOLOv5 for deployment of a TFLite model in an Android application, it is important to note that YOLOv8 and YOLOv5 have some differences in their architectural design. Therefore, directly modifying the YOLOv5 code may not yield the desired results for YOLOv8.

We recognize the usefulness of having a sample project specifically for YOLOv8 deployment in Android, and we apologize for any inconvenience caused by its unavailability. We will take your request into consideration and see if we can provide a sample project or offer more guidance on deploying YOLOv8 in an Android application.

We greatly appreciate your patience and understanding as we work to address your concerns. Feel free to reach out if you have any further questions or need additional assistance.

Best regards, Glenn Jocher (Ultralytics Team)

liyanxi commented 9 months ago

@glenn-jocher Is there a sample documentation available for deploying the YOLOv8 TFLite model in Android? I encountered unpredictable issues when exporting the TFLite format based on the Ultralytics official documentation. Could you please spend some time to help me check? Thank you very much.

safyyy commented 9 months ago

@MuhammadSibtain5099 can you tell about the steps or show me tutorial that should i do to convert yolov8 to TFlite Format on android application i didn't find a tutorial that gives me solution or can i use the same steps with yolov5 model on android app

glenn-jocher commented 9 months ago

@safyyy, the YOLOv8 model conversion to TFLite is done outside of Android, usually on a more powerful machine. Once you've successfully converted the model, you can then include the resulting .tflite file in your Android project to carry out object detection. The conversion process involves exporting the YOLOv8 model weights to the ONNX format and then using the TensorFlow Lite Converter to create the TFLite model.

The steps to implement the TFLite model on Android would typically involve loading the TFLite model using the TFLite interpreter, preparing your image data as an input tensor, running inference on the model using the input tensor, and then extracting and interpreting the output tensor.

For the output tensor with the shape of [1, 9, 8400] from YOLOv8, it consists of 9 bounding box predictions for each of the 8400 grid cells of the image, which are more detailed than what you typically find. Processing these outputs will require you to decode the bounding box predictions, suppress extraneous detections with Non-Maximum Suppression (NMS) and apply further post-processing steps like thresholding to finalize the detected objects.

Please keep in mind, while the concepts might be similar, directly following the steps from YOLOv5's Android implementation wouldn't work for YOLOv8 due to differences in the architectural design and output of the models.

Remember, the specifics for how you implement these steps can depend on how your Android app is structured and the specific requirements of your app. I hope this information is helpful and I wish you luck with your project!

ENNURSILA commented 8 months ago

Hello @safyyy , I found a working example with yolov5 on android. I changed the model to yolov8. but it didn't work on android

glenn-jocher commented 8 months ago

Hello @usertttwm,

Changing the model from YOLOv5 to YOLOv8 in your Android application could entail more than just swapping out the model file. These two versions of YOLO are fundamentally different in terms of both their architecture and outputs.

In particular, YOLOv8 outputs a tensor of shape [1, 9, 8400] for object detection tasks, which represents bounding box predictions for multiple objects. This output shape is quite specific and requires proper interpretation to extract useful information such as bounding box coordinates and class probabilities for each detected object.

Therefore, adjusting the existing code written for the YOLOv5 model to accommodate YOLOv8 might involve reworking the sections responsible for preparing input data, managing model inference, interpreting the output tensor, and handling post-inference processing.

I hope this clarifies why directly substituting YOLOv8 for YOLOv5 in your Android application didn't work as expected. Feel free to ask if you have any further questions!

surendramaran commented 8 months ago

UPDATE

Check out this repository for running live object detection in android https://github.com/surendramaran/YOLOv8-TfLite-Object-Detector . Use any custom YOLOv8 object detection model.

For anyone looking for deploying a tflite model in an android envireonment, here is the code I used, it will give you better understanding. the input shape is (1, 5, 1344)

Dependencies

implementation 'org.tensorflow:tensorflow-lite:2.13.0'
implementation 'org.tensorflow:tensorflow-lite-support:0.4.4'

Note: You don't actually need tensorflow-lite-support, it just help in normalizing the input tensor, which you can do manually as well

class Detector(private val context: Context) {
    private var interpreter: Interpreter? = null

    private val imageProcessor = ImageProcessor.Builder()
        .add(NormalizeOp(INPUT_MEAN, INPUT_STANDARD_DEVIATION))
        .add(CastOp(INPUT_IMAGE_TYPE))
        .build()

    fun setup() {
        val model = FileUtil.loadMappedFile(context, "MODEL_PATH")
        val options = Interpreter.Options()
        options.numThreads = 4
        interpreter = Interpreter(model, options)
    }

    fun clear() {
        interpreter?.close()
        interpreter = null
    }

    private var squareSize = 0
    private var left = 0
    private var top = 0

    fun detect(frame: Bitmap) {
        interpreter ?: return

        if (squareSize == 0) {
            squareSize = min(frame.width, frame.height)
            left = (frame.width - squareSize) / 2
            top = (frame.height - squareSize) / 2
        }

        val croppedBitmap = Bitmap.createBitmap(frame, left, top, squareSize, squareSize)
        val resizedBitmap = Bitmap.createScaledBitmap(croppedBitmap, TENSOR_WIDTH, TENSOR_HEIGHT, false)

        val tensorImage = TensorImage(DataType.FLOAT32)
        tensorImage.load(resizedBitmap)
        val processedImage = imageProcessor.process(tensorImage)
        val imageBuffer = processedImage.buffer

        val output = TensorBuffer.createFixedSize(intArrayOf(1 , 5, NUM_ELEMENTS), OUTPUT_IMAGE_TYPE)
        interpreter?.run(imageBuffer, output.buffer)

        val bestBoxes = bestBox(output.floatArray)

    }

    private fun bestBox(array: FloatArray) : List<BoundingBox>? {

        val boundingBoxes = mutableListOf<BoundingBox>()
        for (c in 0 until NUM_ELEMENTS) {
            val cnf = array[c + NUM_ELEMENTS * 4]
            if (cnf > CONFIDENCE_THRESHOLD) {
                val cx = array[c]
                val cy = array[c + NUM_ELEMENTS]
                val w = array[c + NUM_ELEMENTS * 2]
                val h = array[c + NUM_ELEMENTS * 3]
                val x1 = cx - (w/2F)
                val y1 = cy - (h/2F)
                val x2 = cx + (w/2F)
                val y2 = cy + (h/2F)
                if (x1 <= 0F || x1 >= TENSOR_WIDTH_FLOAT) continue
                if (y1 <= 0F || y1 >= TENSOR_HEIGHT_FLOAT) continue
                if (x2 <= 0F || x2 >= TENSOR_WIDTH_FLOAT) continue
                if (y2 <= 0F || y2 >= TENSOR_HEIGHT_FLOAT) continue
                boundingBoxes.add(
                    BoundingBox(
                    x1 = x1, y1 = y1, x2 = x2, y2 = y2,
                    cx = cx, cy = cy, w = w, h = h, cnf = cnf
                )
                )
            }
        }

        if (boundingBoxes.isEmpty()) return null

        return applyNMS(boundingBoxes)
    }

    private fun applyNMS(boxes: List<BoundingBox>) : MutableList<BoundingBox> {
        val sortedBoxes = boxes.sortedByDescending { it.w * it.h }.toMutableList()
        val selectedBoxes = mutableListOf<BoundingBox>()

        while(sortedBoxes.isNotEmpty()) {
            val first = sortedBoxes.first()
            selectedBoxes.add(first)
            sortedBoxes.remove(first)

            val iterator = sortedBoxes.iterator()
            while (iterator.hasNext()) {
                val nextBox = iterator.next()
                val iou = calculateIoU(first, nextBox)
                if (iou >= IOU_THRESHOLD) {
                    iterator.remove()
                }
            }
        }

        return selectedBoxes
    }

    private fun calculateIoU(box1: BoundingBox, box2: BoundingBox): Float {
        val x1 = maxOf(box1.x1, box2.x1)
        val y1 = maxOf(box1.y1, box2.y1)
        val x2 = minOf(box1.x2, box2.x2)
        val y2 = minOf(box1.y2, box2.y2)
        val intersectionArea = maxOf(0F, x2 - x1) * maxOf(0F, y2 - y1)
        val box1Area = box1.w * box1.h
        val box2Area = box2.w * box2.h
        return intersectionArea / (box1Area + box2Area - intersectionArea)
    }

    companion object {
        private const val TENSOR_WIDTH = 256
        private const val TENSOR_HEIGHT = 256
        private const val TENSOR_WIDTH_FLOAT = TENSOR_WIDTH.toFloat()
        private const val TENSOR_HEIGHT_FLOAT = TENSOR_HEIGHT.toFloat()

        private const val INPUT_MEAN = 0f
        private const val INPUT_STANDARD_DEVIATION = 255f

        private val INPUT_IMAGE_TYPE = DataType.FLOAT32
        private val OUTPUT_IMAGE_TYPE = DataType.FLOAT32

        private const val NUM_ELEMENTS = 1344
        private const val CONFIDENCE_THRESHOLD = 0.75F
        private const val IOU_THRESHOLD = 0.5F
    }
}

Important note: This was implemented in some old version of YOLOv8 where the coordinates in size of input shape, in newer version they are in between 0 and 1

surendramaran commented 8 months ago

You can follow this issue if you want to interpret instance segmentation output. I will upload the solution once I found it.

araneta commented 8 months ago

For anyone looking for deploying a tflite model in an android envireonment, here is the code I used, it will give you better understanding. the input shape is (1, 5, 1344)

Dependencies

implementation 'org.tensorflow:tensorflow-lite:2.13.0'
implementation 'org.tensorflow:tensorflow-lite-support:0.4.4'

Note: You don't actually need tensorflow-lite-support, it just help in normalizing the input tensor, which you can do manually as well

class Detector(private val context: Context) {
    private var interpreter: Interpreter? = null

    private val imageProcessor = ImageProcessor.Builder()
        .add(NormalizeOp(INPUT_MEAN, INPUT_STANDARD_DEVIATION))
        .add(CastOp(INPUT_IMAGE_TYPE))
        .build()

    fun setup() {
        val model = FileUtil.loadMappedFile(context, "MODEL_PATH")
        val options = Interpreter.Options()
        options.numThreads = 4
        interpreter = Interpreter(model, options)
    }

    fun clear() {
        interpreter?.close()
        interpreter = null
    }

    private var squareSize = 0
    private var left = 0
    private var top = 0

    fun detect(frame: Bitmap) {
        interpreter ?: return

        if (squareSize == 0) {
            squareSize = min(frame.width, frame.height)
            left = (frame.width - squareSize) / 2
            top = (frame.height - squareSize) / 2
        }

        val croppedBitmap = Bitmap.createBitmap(frame, left, top, squareSize, squareSize)
        val resizedBitmap = Bitmap.createScaledBitmap(croppedBitmap, TENSOR_WIDTH, TENSOR_HEIGHT, false)

        val tensorImage = TensorImage(DataType.FLOAT32)
        tensorImage.load(resizedBitmap)
        val processedImage = imageProcessor.process(tensorImage)
        val imageBuffer = processedImage.buffer

        val output = TensorBuffer.createFixedSize(intArrayOf(1 , 5, NUM_ELEMENTS), OUTPUT_IMAGE_TYPE)
        interpreter?.run(imageBuffer, output.buffer)

        val bestBoxes = bestBox(output.floatArray)

    }

    private fun bestBox(array: FloatArray) : List<BoundingBox>? {

        val boundingBoxes = mutableListOf<BoundingBox>()
        for (c in 0 until NUM_ELEMENTS) {
            val cnf = array[c + NUM_ELEMENTS * 4]
            if (cnf > CONFIDENCE_THRESHOLD) {
                val cx = array[c]
                val cy = array[c + NUM_ELEMENTS]
                val w = array[c + NUM_ELEMENTS * 2]
                val h = array[c + NUM_ELEMENTS * 3]
                val x1 = cx - (w/2F)
                val y1 = cy - (h/2F)
                val x2 = cx + (w/2F)
                val y2 = cy + (h/2F)
                if (x1 <= 0F || x1 >= TENSOR_WIDTH_FLOAT) continue
                if (y1 <= 0F || y1 >= TENSOR_HEIGHT_FLOAT) continue
                if (x2 <= 0F || x2 >= TENSOR_WIDTH_FLOAT) continue
                if (y2 <= 0F || y2 >= TENSOR_HEIGHT_FLOAT) continue
                boundingBoxes.add(
                    BoundingBox(
                    x1 = x1, y1 = y1, x2 = x2, y2 = y2,
                    cx = cx, cy = cy, w = w, h = h, cnf = cnf
                )
                )
            }
        }

        if (boundingBoxes.isEmpty()) return null

        return applyNMS(boundingBoxes)
    }

    private fun applyNMS(boxes: List<BoundingBox>) : MutableList<BoundingBox> {
        val sortedBoxes = boxes.sortedByDescending { it.w * it.h }.toMutableList()
        val selectedBoxes = mutableListOf<BoundingBox>()

        while(sortedBoxes.isNotEmpty()) {
            val first = sortedBoxes.first()
            selectedBoxes.add(first)
            sortedBoxes.remove(first)

            val iterator = sortedBoxes.iterator()
            while (iterator.hasNext()) {
                val nextBox = iterator.next()
                val iou = calculateIoU(first, nextBox)
                if (iou >= IOU_THRESHOLD) {
                    iterator.remove()
                }
            }
        }

        return selectedBoxes
    }

    private fun calculateIoU(box1: BoundingBox, box2: BoundingBox): Float {
        val x1 = maxOf(box1.x1, box2.x1)
        val y1 = maxOf(box1.y1, box2.y1)
        val x2 = minOf(box1.x2, box2.x2)
        val y2 = minOf(box1.y2, box2.y2)
        val intersectionArea = maxOf(0F, x2 - x1) * maxOf(0F, y2 - y1)
        val box1Area = box1.w * box1.h
        val box2Area = box2.w * box2.h
        return intersectionArea / (box1Area + box2Area - intersectionArea)
    }

    companion object {
        private const val TENSOR_WIDTH = 256
        private const val TENSOR_HEIGHT = 256
        private const val TENSOR_WIDTH_FLOAT = TENSOR_WIDTH.toFloat()
        private const val TENSOR_HEIGHT_FLOAT = TENSOR_HEIGHT.toFloat()

        private const val INPUT_MEAN = 0f
        private const val INPUT_STANDARD_DEVIATION = 255f

        private val INPUT_IMAGE_TYPE = DataType.FLOAT32
        private val OUTPUT_IMAGE_TYPE = DataType.FLOAT32

        private const val NUM_ELEMENTS = 1344
        private const val CONFIDENCE_THRESHOLD = 0.75F
        private const val IOU_THRESHOLD = 0.5F
    }
}

Important note: This was implemented in some old version of YOLOv8 where the coordinates in size of input shape, in newer version they are in between 0 and 1

Thanks for the sample code. could you please give me the example how to use this class? thanks

glenn-jocher commented 8 months ago

@araneta sure, to use this Detector class, you'll need to follow these steps:

Initialize an instance of the Detector class by passing the context (this refers to the current activity or application context).

val detector = Detector(context)
Call the detector setup function to load the YOLOv8 model from your assets folder.

detector.setup()
Whenever you capture an image or frame that you want to pass through the model for detection, call the detect function like so:

detector.detect(bitmap)

In this example, "bitmap" is a variable that represents the image or frame you captured. This function will run the model on your input bitmap and internally handle the detection and bounding box creation for the objects found within your image.

Remember to call the clear function when you're done to release resources.

detector.clear()

Also, keep in mind that you have to replace the "MODEL_PATH" placeholder in the setup function with the file path to your actual YOLOv8 model. The model file should be located in the assets folder of your Android project.

This class also assumes your YOLOv8 model is a TFLite file that was trained to perform object detection, and it's expecting the model to have a specific input shape. Make sure that matches with your actual model specification.

manishp54 commented 7 months ago

For anyone looking for deploying a tflite model in an android envireonment, here is the code I used, it will give you better understanding. the input shape is (1, 5, 1344)

Dependencies

implementation 'org.tensorflow:tensorflow-lite:2.13.0'
implementation 'org.tensorflow:tensorflow-lite-support:0.4.4'

Note: You don't actually need tensorflow-lite-support, it just help in normalizing the input tensor, which you can do manually as well

class Detector(private val context: Context) {
    private var interpreter: Interpreter? = null

    private val imageProcessor = ImageProcessor.Builder()
        .add(NormalizeOp(INPUT_MEAN, INPUT_STANDARD_DEVIATION))
        .add(CastOp(INPUT_IMAGE_TYPE))
        .build()

    fun setup() {
        val model = FileUtil.loadMappedFile(context, "MODEL_PATH")
        val options = Interpreter.Options()
        options.numThreads = 4
        interpreter = Interpreter(model, options)
    }

    fun clear() {
        interpreter?.close()
        interpreter = null
    }

    private var squareSize = 0
    private var left = 0
    private var top = 0

    fun detect(frame: Bitmap) {
        interpreter ?: return

        if (squareSize == 0) {
            squareSize = min(frame.width, frame.height)
            left = (frame.width - squareSize) / 2
            top = (frame.height - squareSize) / 2
        }

        val croppedBitmap = Bitmap.createBitmap(frame, left, top, squareSize, squareSize)
        val resizedBitmap = Bitmap.createScaledBitmap(croppedBitmap, TENSOR_WIDTH, TENSOR_HEIGHT, false)

        val tensorImage = TensorImage(DataType.FLOAT32)
        tensorImage.load(resizedBitmap)
        val processedImage = imageProcessor.process(tensorImage)
        val imageBuffer = processedImage.buffer

        val output = TensorBuffer.createFixedSize(intArrayOf(1 , 5, NUM_ELEMENTS), OUTPUT_IMAGE_TYPE)
        interpreter?.run(imageBuffer, output.buffer)

        val bestBoxes = bestBox(output.floatArray)

    }

    private fun bestBox(array: FloatArray) : List<BoundingBox>? {

        val boundingBoxes = mutableListOf<BoundingBox>()
        for (c in 0 until NUM_ELEMENTS) {
            val cnf = array[c + NUM_ELEMENTS * 4]
            if (cnf > CONFIDENCE_THRESHOLD) {
                val cx = array[c]
                val cy = array[c + NUM_ELEMENTS]
                val w = array[c + NUM_ELEMENTS * 2]
                val h = array[c + NUM_ELEMENTS * 3]
                val x1 = cx - (w/2F)
                val y1 = cy - (h/2F)
                val x2 = cx + (w/2F)
                val y2 = cy + (h/2F)
                if (x1 <= 0F || x1 >= TENSOR_WIDTH_FLOAT) continue
                if (y1 <= 0F || y1 >= TENSOR_HEIGHT_FLOAT) continue
                if (x2 <= 0F || x2 >= TENSOR_WIDTH_FLOAT) continue
                if (y2 <= 0F || y2 >= TENSOR_HEIGHT_FLOAT) continue
                boundingBoxes.add(
                    BoundingBox(
                    x1 = x1, y1 = y1, x2 = x2, y2 = y2,
                    cx = cx, cy = cy, w = w, h = h, cnf = cnf
                )
                )
            }
        }

        if (boundingBoxes.isEmpty()) return null

        return applyNMS(boundingBoxes)
    }

    private fun applyNMS(boxes: List<BoundingBox>) : MutableList<BoundingBox> {
        val sortedBoxes = boxes.sortedByDescending { it.w * it.h }.toMutableList()
        val selectedBoxes = mutableListOf<BoundingBox>()

        while(sortedBoxes.isNotEmpty()) {
            val first = sortedBoxes.first()
            selectedBoxes.add(first)
            sortedBoxes.remove(first)

            val iterator = sortedBoxes.iterator()
            while (iterator.hasNext()) {
                val nextBox = iterator.next()
                val iou = calculateIoU(first, nextBox)
                if (iou >= IOU_THRESHOLD) {
                    iterator.remove()
                }
            }
        }

        return selectedBoxes
    }

    private fun calculateIoU(box1: BoundingBox, box2: BoundingBox): Float {
        val x1 = maxOf(box1.x1, box2.x1)
        val y1 = maxOf(box1.y1, box2.y1)
        val x2 = minOf(box1.x2, box2.x2)
        val y2 = minOf(box1.y2, box2.y2)
        val intersectionArea = maxOf(0F, x2 - x1) * maxOf(0F, y2 - y1)
        val box1Area = box1.w * box1.h
        val box2Area = box2.w * box2.h
        return intersectionArea / (box1Area + box2Area - intersectionArea)
    }

    companion object {
        private const val TENSOR_WIDTH = 256
        private const val TENSOR_HEIGHT = 256
        private const val TENSOR_WIDTH_FLOAT = TENSOR_WIDTH.toFloat()
        private const val TENSOR_HEIGHT_FLOAT = TENSOR_HEIGHT.toFloat()

        private const val INPUT_MEAN = 0f
        private const val INPUT_STANDARD_DEVIATION = 255f

        private val INPUT_IMAGE_TYPE = DataType.FLOAT32
        private val OUTPUT_IMAGE_TYPE = DataType.FLOAT32

        private const val NUM_ELEMENTS = 1344
        private const val CONFIDENCE_THRESHOLD = 0.75F
        private const val IOU_THRESHOLD = 0.5F
    }
}

Important note: This was implemented in some old version of YOLOv8 where the coordinates in size of input shape, in newer version they are in between 0 and 1

how can we extract the class id from it?

glenn-jocher commented 7 months ago

@manishp54 in the provided code, it seems like there is not an explicit class ID extraction process happening, thankfully it’s straightforward to do so.

The YOLOv8 model output is an array that not only includes the coordinates of the bounding box but also the object class confidence scores for each class it's trained on.

To extract the class ID, you would need to add an additional step after you run the interpreter on your image buffer. This step would involve iterating through the object class confidence scores and identifying the index of the highest confidence score. The index corresponds to the class ID.

However, I would like to caution that how exactly you extract class ID might vary depending on your training and if any adjustments were made to the output layer.

Regardless, this addition should give you the class IDs for any detected objects, assuming the model was properly trained for classification in addition to bounding box detection.

sceddd commented 7 months ago

i have tried these codes with shape [1,7,8400] but it looks like it's not working well. The conf score became extremely low, around 2.3e-7.

class ObjectDetection(private val context :Context){
    private var interpreter: Interpreter?=null
    private val imageProcessor = ImageProcessor.Builder()
        .add(NormalizeOp(INPUT_MEAN, INPUT_STANDARD_DEVIATION))
        .add(CastOp(INPUT_IMAGE_TYPE))
        .build()
    private var squareSize = 0
    private var left = 0
    private var top = 0
    fun setup() {
        val model = FileUtil.loadMappedFile(context, MODEL_PATH)
        val options = Interpreter.Options()
        options.numThreads = 4
        interpreter = Interpreter(model, options)
    }

    fun clear() {
        interpreter?.close()
        interpreter = null
    }
    fun detect(frame:Bitmap):List<BoundingBox>?{
        interpreter?: return null

        if (squareSize == 0) {
            squareSize = min(frame.width, frame.height)
            left = (frame.width - squareSize) / 2
            top = (frame.height - squareSize) / 2
        }

        val croppedBitmap = Bitmap.createBitmap(frame, left, top, squareSize, squareSize)
        val resizedBitmap = Bitmap.createScaledBitmap(croppedBitmap, TENSOR_WIDTH, TENSOR_HEIGHT, false)

        val tensorImage = TensorImage(DataType.FLOAT32)
        tensorImage.load(resizedBitmap)
        val processedImage = imageProcessor.process(tensorImage)
        val imageBuffer = processedImage.buffer

        val output = TensorBuffer.createFixedSize(intArrayOf(1 , 7, NUM_ELEMENTS), OUTPUT_IMAGE_TYPE)
        interpreter?.run(imageBuffer, output.buffer)

        val bestBoxes = bestBox(output.floatArray)
        return bestBoxes
    }

    private fun bestBox(array: FloatArray) : List<BoundingBox>? {

        val boundingBoxes = mutableListOf<BoundingBox>()
        for (c in 0 until NUM_ELEMENTS) {
            val cnf = array[c + NUM_ELEMENTS * 4]

            if (cnf > CONFIDENCE_THRESHOLD) {
                val cx = array[c]
                val cy = array[c + NUM_ELEMENTS]
                val w = array[c + NUM_ELEMENTS * 2]
                val h = array[c + NUM_ELEMENTS * 3]
                val x1 = cx - (w/2F)
                val y1 = cy - (h/2F)
                val x2 = cx + (w/2F)
                val y2 = cy + (h/2F)
                if (x1 <= 0F || x1 >= TENSOR_WIDTH_FLOAT) continue
                if (y1 <= 0F || y1 >= TENSOR_HEIGHT_FLOAT) continue
                if (x2 <= 0F || x2 >= TENSOR_WIDTH_FLOAT) continue
                if (y2 <= 0F || y2 >= TENSOR_HEIGHT_FLOAT) continue
                boundingBoxes.add(
                    BoundingBox(
                        x1 = x1, y1 = y1, x2 = x2, y2 = y2,
                        cx = cx, cy = cy, w = w, h = h, cnf = cnf
                    )
                )
            }
        }

        if (boundingBoxes.isEmpty()) return null

        return applyNMS(boundingBoxes)
    }
    private fun calculateIoU(box1: BoundingBox, box2: BoundingBox): Float {
        val x1 = maxOf(box1.x1, box2.x1)
        val y1 = maxOf(box1.y1, box2.y1)
        val x2 = minOf(box1.x2, box2.x2)
        val y2 = minOf(box1.y2, box2.y2)
        val intersectionArea = maxOf(0F, x2 - x1) * maxOf(0F, y2 - y1)
        val box1Area = box1.w * box1.h
        val box2Area = box2.w * box2.h
        return intersectionArea / (box1Area + box2Area - intersectionArea)
    }
    private fun applyNMS(boxes: List<BoundingBox>) : MutableList<BoundingBox> {
        val sortedBoxes = boxes.sortedByDescending { it.w * it.h }.toMutableList()
        val selectedBoxes = mutableListOf<BoundingBox>()

        while(sortedBoxes.isNotEmpty()) {
            val first = sortedBoxes.first()
            selectedBoxes.add(first)
            sortedBoxes.remove(first)

            val iterator = sortedBoxes.iterator()
            while (iterator.hasNext()) {
                val nextBox = iterator.next()
                val iou = calculateIoU(first, nextBox)
                if (iou >= IOU_THRESHOLD) {
                    iterator.remove()
                }
            }
        }

        return selectedBoxes
    }
    companion object {
        private const val MODEL_PATH = "yolov8_float16.tflite"
        private const val TENSOR_WIDTH = 640
        private const val TENSOR_HEIGHT = 640
        private const val TENSOR_WIDTH_FLOAT = TENSOR_WIDTH.toFloat()
        private const val TENSOR_HEIGHT_FLOAT = TENSOR_HEIGHT.toFloat()

        const val INPUT_MEAN = 0f
        const val INPUT_STANDARD_DEVIATION = 255f

        val INPUT_IMAGE_TYPE = DataType.FLOAT32
        private val OUTPUT_IMAGE_TYPE = DataType.FLOAT32

        private const val NUM_ELEMENTS = 8400
        private const val CONFIDENCE_THRESHOLD = 0.5F
        private const val IOU_THRESHOLD = 0.5F
    }
}
data class BoundingBox(
    val x1:Float,
    val y1:Float,
    val x2:Float,
    val y2:Float,
    val cx:Float,
    val cy:Float,
    val w:Float,
    val h:Float,
    val cnf:Float,
)

glenn-jocher commented 7 months ago

@sceddd firstly, ensure you reshape your model input correctly. From your code, it looks like you are reshaping the model to a [1,7,8400] array. Ensure this is the correct shape for your specific YOLOv8 model.

The low confidence score could be due to several reasons.

There might be an issue with the image processing step. Make sure the images are preprocessed adequately before feeding them into the model (e.g., normalization, resizing).
If your model was trained on a different domain of images (different from what you are testing now), you might get low confidence scores.
Another possibility could be the deployment of a poorly trained model. Double-check your training process and validate your model's performance metrics during training.

Lastly, check your CONFIDENCE_THRESHOLD. The confidence threshold is a model hyperparameter; if it's too high, you might be missing out on valid detections. Try lowering the confidence threshold to see if the model is able to detect anything at a lower confidence. It might help to remember that a lower confidence doesn't necessarily mean the model is performing poorly, it might just mean it's less certain about its predictions.

I hope these troubleshooting steps help you improve your model's performance.

sceddd commented 7 months ago

well my dataset was trained on coco dataset, so I think the third one is not the problem. About the first one, the input image was taken from the camera bitmap, do I have to normalize the image taken from it? Thanks for replying.

surendramaran commented 7 months ago

As @glenn-jocher said, this could be the result of not properly preprocessing input image.

The YOLOv8 input image expect an RGB image, normalised between 0 and 1, and should match the input size with size that the model was trained on.

Atleast in my case this is true, I not 100% sure about this.

sceddd commented 7 months ago

hello @surendramaran, for the first line in your code, private val imageProcessor = ImageProcessor.Builder() .add(NormalizeOp(INPUT_MEAN, INPUT_STANDARD_DEVIATION)) .add(CastOp(INPUT_IMAGE_TYPE)) .build() Is it for normalized images, or am I missing something?

surendramaran commented 7 months ago

Yes, this helper function is provided by tensorflow-lite-support library, it is an important step to do first, I'm sure this is what you are missing.

Normalisation of an image is a process of converting every red, green and blue pixel value which are in between 0 to 255 to the appropriate range in which the model is trained on.

The mean 0F and Standard Deviation 255F which I used in my code will eventually become like this.

(pixelValue - 0)/255

This will result in a normalised image which will be ranged from 0 to 1.

In YOLOv8 the input image expected to be normalised in this range, in other ML models it can be between -1 to +1.

It is an important preprocessing step that an input image should go through. In case if you don't want to use that library you can do this manually as well, you just have to loop every pixel's RGB values and do the maths.

Make sure to use this like this

val tensorImage = TensorImage(DataType.FLOAT32)
tensorImage.load(resizedBitmap)
val processedImage = imageProcessor.process(tensorImage)
val imageBuffer = processedImage.buffer

sceddd commented 7 months ago

Thanks for replying.@surendramaran I've already done it before, so the problem is still there. Since I don't change anything much except the shape of the image.Any other ideas? Btw, Can you explain a bit about the way you took the cnf values? from the code I can understand that you take the predict box from the output array and take the best ones.

surendramaran commented 7 months ago

Honestly, my Android project was only involved a single class and I haven't deal with more the one class yet. In my case the number of channel was 5 because my output shape was [1, 5, 1344], I really can't say about a different output shape which I haven't tried.

In your case just remember one rule here if you have number of elements 8400 then it means that your model give 8400 predicted objects.

When we use the converted tflite model in Android the the output is an float array and the the size of the array will be 1 x 7 x 8400 = 58800

Here is how you can extract meaningful information from this array.

The first 8400 values will be centerX of the all predicted objects and the second 8400 values will be centerY. Third will be width and forth will be height.

You have to follow this pattern to get meaningful informations, it look like your model can detect 3 different classes but I can be wrong.

Just one more question, is your model have output shape of [1,7,8400] or [1,8400,7]?

In my case I took the confidence score from the 5th 1344 values

sceddd commented 7 months ago

well, the first one for my case. After testing a little bit, I tried multiplying the cnf with 10^4 and using min() to take conf between 0.99 and the real conf, and seems like it works in my case. It looks like your code is correct but somehow when working with a bigger shape, the conf is decreased. It is not the solution I'm looking for, so I will try to figure out what's wrong and tell you later. Thank you, again. I'm very appreciate your helps.

glenn-jocher commented 7 months ago

@sceddd i'm glad to hear that you've made some progress with your model. However, I'd advise against artificially inflating the confidence scores by a factor, as it doesn't address the underlying issue. The fact that confidence scores improve could simply be a side effect.

The confidence scores being reduced might be an expected behavior due to the manner of working of your specific model. The model could be providing lower confidence scores due to the use of multiple bounding boxes per grid cell, as is common in YOLO architectures.

Take the time to debug your preprocessing steps and make sure the image data are being input correctly as this is often the cause of unexpected results. Also, check your model output interpretation logic to ensure no steps are misunderstood or missed.

Finally, as a kind reminder, the speed and accuracy of your model are intrinsically linked with the quality and diversity of data it's been trained with. It could help to review the training data and process. Keep experimenting and I'm sure you'll find the solution. Good luck!

sceddd commented 7 months ago

@glenn-jocher Thanks for your advice, after checking @surendramaran explanation about his output, I think I know where the problem is, it is based on the index of the array where I get the labels for my detection. In multiple detections, I need to change a little bit. Example in my case, with the labels: NC: 3 # number of classes names: ['motorbike', 'person', 'bicycle'] the 'person' class is on the second row so I need to change this code val cnf = array[c + NUM_ELEMENTS * 4] (in your case is for single detection) to this

 val indices = (4..(3+ LABEL_SIZE )).map { c + NUM_ELEMENTS * it } //  I have 3 classes so the LABEL_SIZE is 3
            val cnf_array = indices.map { array[it] }.toFloatArray()
            val cnf = cnf_array.maxOrNull() ?: continue

I'm not sure is the best way to do this or not but it works fine in my case

glenn-jocher commented 7 months ago

@sceddd it's encouraging to hear that you're making progress on your implementation. Looking at your explanation, it seems like you are on the right track.

As your model has been trained to detect several different labels, it indeed makes sense to cycle through all indices corresponding to these labels to identify the maximum confidence score when performing multi-label detection.

In essence, your preliminary observations correspond with how YOLOv8 is expected to handle multiple classes. The confidence score for each class is predicted separately, and by taking the maximum of these predictions, you are essentially selecting the object class that your model has the highest confidence in.

Your approach seems viable based upon your mentioned use-case. However, it's all a matter of use-case and whether the specific approach is best for it. You might want to consider the implications of this approach, such as scenarios where multiple classes have high confidence scores. In a different case, you might want to take a different approach.

Feel free to experiment and iterate until you have a solution that best serves your requirements. It's great to see you dive so deeply into your code! We appreciate your keen focus on problem-solving and hope your code keeps working as expected!

sceddd commented 7 months ago

Hello @glenn-jocher, one more question. After run yolov8 on my app, it's lagging so much, maybe the fps just around 4 or 5. Do you have any idea about this? I'm just confused is it because of the model or just because of the output is so large that i have to loop through make it lagging.

surendramaran commented 7 months ago

Use quantized tflite model (int8)
Train model in less resolution
Try gpu acceleration

glenn-jocher commented 7 months ago

Hello @sceddd,

Performance issues like these may arise due to a number of reasons, some of which could be related to the model and some related to processing techniques.

Model complexity: The YOLOv8 model is complex and handling it on devices with limited computational power could be challenging. Using a less complex model or reducing resolution during training can help.

2.play> Use quantized TFLite model: The quantization can help reduce the model's size and potentially increase the inference speed. Consider converting your model to the quantized int8 version which can accelerate the model in some devices.

GPU acceleration: You could also try and leverage hardware acceleration by running the model using the GPU delegate provided by TensorFlow Lite.
Optimizing code: While handling outputs, try optimising your code to avoid unnecessary computations. Looping through large output arrays has high time complexity, so consider ways you can optimize this process.

Remember, finding the right balance between accuracy and speed, considering the capabilities of the device at hand, is key.

AkoAyMatsu commented 6 months ago

how about with an output shape of [1,38,8400]?

glenn-jocher commented 6 months ago

@AkoAyMatsu hello there,

Thank you for bringing up this question about the YOLOv8 model output shape.

If your YOLOv8 model is resulting in an output shape of [1, 38, 8400], the first dimension 1 represents a single inference batch, as models typically output a batch even if you are inferring one image at a time.

The second dimension 38 likely corresponds to the combined data regarding the bounding boxes, objectness score, and class probabilities. For instance, if YOLOv8 is set up to predict 35 classes, the breakdown could look like this: 4 values for bounding box coordinates (x, y, width, height), 1 value for the objectness score, and 35 values for the class probabilities, making a total of 40. It's a bit unusual that you have 38 here, so you may want to confirm the specifics of your model configuration or training setup to ensure that's the format you should expect.

Lastly, the number 8400 suggests a large number of potential bounding boxes being predicted. This would typical for a grid-based approach where multiple bounding box predictions are made for each grid cell over the input image.

To process these outputs in your application, you would need to iterate through the detections, decode the bounding box information, determine the objectness score, and classify them based on the highest class probability score while filtering out low confidence detections.

Bear in mind that performance may vary depending on how you handle this large output tensor, particularly on resource-constrained devices such as smartphones. It's essential to optimize the post-processing step to manage the computations effectively.

I hope this explanation provides clarity on your YOLOv8 model's output shape. Should you have any further questions or details to discuss, feel free to share!

Best regards.

sceddd commented 6 months ago

@AkoAyMatsu if you talk about the Android version then implement @surendramaran code and fix a bit, in your case it will look like this

class ObjectDetection(private val context :Context){
    private var interpreter: Interpreter?=null
    private val imageProcessor = ImageProcessor.Builder()
        .add(NormalizeOp(INPUT_MEAN, INPUT_STANDARD_DEVIATION))
        .add(CastOp(INPUT_IMAGE_TYPE))
        .build()
    private var squareSize = 0
    private var left = 0
    private var top = 0
    fun setup() {
        val model = FileUtil.loadMappedFile(context, MODEL_PATH)
        val options = Interpreter.Options()
        options.numThreads = 4
        interpreter = Interpreter(model, options)
    }

    fun clear() {
        interpreter?.close()
        interpreter = null
    }
    fun detect(frame:Bitmap):List<BoundingBox>?{
        interpreter?: return null

        if (squareSize == 0) {
            squareSize = min(frame.width, frame.height)
            left = (frame.width - squareSize) / 2
            top = (frame.height - squareSize) / 2
        }

        val croppedBitmap = Bitmap.createBitmap(frame, left, top, squareSize, squareSize)
        val resizedBitmap = Bitmap.createScaledBitmap(croppedBitmap, TENSOR_WIDTH, TENSOR_HEIGHT, false)

        val tensorImage = TensorImage(DataType.FLOAT32)
        tensorImage.load(resizedBitmap)
        val processedImage = imageProcessor.process(tensorImage)
        val imageBuffer = processedImage.buffer

        val output = TensorBuffer.createFixedSize(intArrayOf(1 , 38, NUM_ELEMENTS), OUTPUT_IMAGE_TYPE) //  output size [1, 38, 8400] 
        interpreter?.run(imageBuffer, output.buffer)

        val bestBoxes = bestBox(output.floatArray)
        return bestBoxes
    }

    private fun bestBox(array: FloatArray) : List<BoundingBox>? {

        val boundingBoxes = mutableListOf<BoundingBox>()
        for (c in 0 until NUM_ELEMENTS) {
            val indices = (4..(3+ LABEL_SIZE )).map { c + NUM_ELEMENTS * it } //  I have 3 classes so the LABEL_SIZE is 3
            val cnf_array = indices.map { array[it] }.toFloatArray()
            val cnf = cnf_array.maxOrNull() ?: continue

            if (cnf > CONFIDENCE_THRESHOLD) {
                val cx = array[c]
                val cy = array[c + NUM_ELEMENTS]
                val w = array[c + NUM_ELEMENTS * 2]
                val h = array[c + NUM_ELEMENTS * 3]
                val x1 = cx - (w/2F)
                val y1 = cy - (h/2F)
                val x2 = cx + (w/2F)
                val y2 = cy + (h/2F)
                if (x1 <= 0F || x1 >= TENSOR_WIDTH_FLOAT) continue
                if (y1 <= 0F || y1 >= TENSOR_HEIGHT_FLOAT) continue
                if (x2 <= 0F || x2 >= TENSOR_WIDTH_FLOAT) continue
                if (y2 <= 0F || y2 >= TENSOR_HEIGHT_FLOAT) continue
                boundingBoxes.add(
                    BoundingBox(
                        x1 = x1, y1 = y1, x2 = x2, y2 = y2,
                        cx = cx, cy = cy, w = w, h = h, cnf = cnf
                    )
                )
            }
        }

        if (boundingBoxes.isEmpty()) return null

        return applyNMS(boundingBoxes)
    }
    private fun calculateIoU(box1: BoundingBox, box2: BoundingBox): Float {
        val x1 = maxOf(box1.x1, box2.x1)
        val y1 = maxOf(box1.y1, box2.y1)
        val x2 = minOf(box1.x2, box2.x2)
        val y2 = minOf(box1.y2, box2.y2)
        val intersectionArea = maxOf(0F, x2 - x1) * maxOf(0F, y2 - y1)
        val box1Area = box1.w * box1.h
        val box2Area = box2.w * box2.h
        return intersectionArea / (box1Area + box2Area - intersectionArea)
    }
    private fun applyNMS(boxes: List<BoundingBox>) : MutableList<BoundingBox> {
        val sortedBoxes = boxes.sortedByDescending { it.w * it.h }.toMutableList()
        val selectedBoxes = mutableListOf<BoundingBox>()

        while(sortedBoxes.isNotEmpty()) {
            val first = sortedBoxes.first()
            selectedBoxes.add(first)
            sortedBoxes.remove(first)

            val iterator = sortedBoxes.iterator()
            while (iterator.hasNext()) {
                val nextBox = iterator.next()
                val iou = calculateIoU(first, nextBox)
                if (iou >= IOU_THRESHOLD) {
                    iterator.remove()
                }
            }
        }

        return selectedBoxes
    }
    companion object {
        private const val MODEL_PATH = "yolov8_float16.tflite"
        private const val TENSOR_WIDTH = 640
        private const val TENSOR_HEIGHT = 640
        private const val LABEL_SIZE = 3
        private const val TENSOR_WIDTH_FLOAT = TENSOR_WIDTH.toFloat()
        private const val TENSOR_HEIGHT_FLOAT = TENSOR_HEIGHT.toFloat()

        const val INPUT_MEAN = 0f
        const val INPUT_STANDARD_DEVIATION = 255f

        val INPUT_IMAGE_TYPE = DataType.FLOAT32
        private val OUTPUT_IMAGE_TYPE = DataType.FLOAT32

        private const val NUM_ELEMENTS = 8400
        private const val CONFIDENCE_THRESHOLD = 0.5F
        private const val IOU_THRESHOLD = 0.5F
    }
}
data class BoundingBox(
    val x1:Float,
    val y1:Float,
    val x2:Float,
    val y2:Float,
    val cx:Float,
    val cy:Float,
    val w:Float,
    val h:Float,
    val cnf:Float,
)

how about with an output shape of [1,38,8400]?

glenn-jocher commented 6 months ago

It appears that a community member has tried to assist with the output shape [1,38,8400] issue on Android. From the code snippet provided, that person suggested an implementation that takes the output from the TFLite interpreter and processes it to extract potential bounding boxes and their associated confidence levels.

If you are experiencing performance issues with the YOLOv8 model on your Android device, here are some potential considerations that could explain the lag:

Hardware limitations: Mobile devices generally have limited computational power compared to desktop environments where models are typically developed and trained. If the device is unable to process the frames quickly enough, it can lead to lower FPS.
Model complexity: As the dimension of your output tensor increases, so does the computational complexity of the post-processing code. With an output shape of [1,38,8400], your post-processing code needs to handle 38 x 8400 float values per frame, which is substantial.
Optimization needs: It’s crucial to optimize both the model and the application code for the mobile environment. This could involve:
- Revisiting your preprocessing and post-processing steps to reduce computation.
- Using the GPU or NNAPI acceleration features of TFLite to speed up inference.
- Simplifying the model structure or using a smaller model if the accuracy trade-off is acceptable.
Real-time requirements: Real-time detection often requires a delicate balance between speed and accuracy. Using quantization or a lower resolution model, as noted by community members ( @sceddd and @surendramaran), can be a practical approach. You might have to retrain your model with these settings applied.

Lastly, continuously processing each frame can be quite intensive. You should, where possible, also consider frame skipping techniques or performing inference only when there’s a high likelihood of change in the scene to reduce the workload on the device.

Since optimizing ML models for mobile devices is an iterative process, it involves profiling, identifying bottlenecks, and addressing them accordingly. If performance is still an issue after trying these suggestions, you may have to profile your application to see exactly where the bottlenecks are and address them specifically.

surendramaran commented 6 months ago

Thanks @glenn-jocher for pointing out frame skipping technique.

In Android two major ways to implement Camera. Camera2 and CameraX

CameraX: this is straight forward for CameraX, setting setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST) for image analyser is enough.

imageAnalyzer = ImageAnalysis.Builder()
            .setTargetResolution(size)
            .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
            .setTargetRotation(binding.viewFinder.display.rotation)
            .build()

Camera2: Use Executor service

private lateinit var frameExecutor: ExecutorService

private fun startThread() {
    frameExecutor = ThreadPoolExecutor(
        1, 1,
        0L, TimeUnit.MILLISECONDS,
        LinkedBlockingQueue(1),
        ThreadPoolExecutor.DiscardOldestPolicy()
    )
}

private fun stopThread() {
    frameExecutor.shutdownNow()
}

// Here is how to use it
private fun processFrame(frame: Bitmap) {
    frameExecutor.submit {
        fingertipDetector.detect(frame)
    }
}

glenn-jocher commented 6 months ago

@surendramaran thanks for adding to the discussion with practical advice on how to handle camera frame processing for object detection with YOLOv8 on Android using CameraX and Camera2 APIs.

For CameraX, setting the BackpressureStrategy to STRATEGY_KEEP_ONLY_LATEST indeed ensures that your analysis receives only the latest available frame and skips any frames that it can't process in time, which can greatly help to maintain real-time performance without backlog.

For Camera2, utilizing a single-threaded ExecutorService with a DiscardOldestPolicy is a good strategy. It helps by running detection in a separate thread and discarding any frame that comes in while the currently processed frame is still being processed. This throttling mechanism prevents the application from becoming overwhelmed with frames waiting to be processed, which can lead to lag.

Both of these strategies serve to reduce the load on the system by effectively managing the rate at which frames are processed, thereby potentially increasing the FPS and improving the smoothness of the application's performance.

Implementing such strategies, alongside model optimization and device-specific enhancements, can make a marked difference in the responsiveness and usability of real-time object detection applications on Android.

ENNURSILA commented 5 months ago

Hello There, This is the Yolov5Classifier.java file. It works for yolov5 model. Yolov5 model input shape (1, 3, 416, 416) output shape (1, 10647, 11) Yolov8 model input shape (1, 3, 416, 416) and output shape(s) (1, 6, 3549). Where do I need to change in Yolov5Classifier.java file for yolov8 ? @surendramaran

Yolov5Classifier.java file

public class YoloV5Classifier implements Classifier {

    /**
     * Initializes a native TensorFlow session for classifying images.
     *
     * @param assetManager  The asset manager to be used to load assets.
     * @param modelFilename The filepath of the model GraphDef protocol buffer.
     * @param labelFilename The filepath of label file for classes.
     * @param isQuantized   Boolean representing model is quantized or not
     */
    public static YoloV5Classifier create(
            final AssetManager assetManager,
            final String modelFilename,
            final String labelFilename,
            final boolean isQuantized,
            final int inputSize
            /*final int[] output_width,
            final int[][] masks,
            final int[] anchors*/)
            throws IOException {
        final YoloV5Classifier d = new YoloV5Classifier();
        String actualFilename = labelFilename.split("file:///android_asset/")[0];
        //String actualFilename = labelFilename.split("customclasses.txt")[1];

        InputStream labelsInput = assetManager.open(actualFilename);
        BufferedReader br = new BufferedReader(new InputStreamReader(labelsInput));
        String line;
        while ((line = br.readLine()) != null) {
            LOGGER.w(line);
            d.labels.add(line);
        }
        br.close();

        try {
            Interpreter.Options options = (new Interpreter.Options());
            options.setNumThreads(NUM_THREADS);
            if (isNNAPI) {
                d.nnapiDelegate = null;
                // Initialize interpreter with NNAPI delegate for Android Pie or above
                if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.P) {
                    d.nnapiDelegate = new NnApiDelegate();
                    options.addDelegate(d.nnapiDelegate);
                    options.setNumThreads(NUM_THREADS);
//                    options.setUseNNAPI(false);
//                    options.setAllowFp16PrecisionForFp32(true);
//                    options.setAllowBufferHandleOutput(true);
                    options.setUseNNAPI(true);
                }
            }
            if (isGPU) {
                GpuDelegate.Options gpu_options = new GpuDelegate.Options();
                gpu_options.setPrecisionLossAllowed(true); // It seems that the default is true
                gpu_options.setInferencePreference(GpuDelegate.Options.INFERENCE_PREFERENCE_SUSTAINED_SPEED);
                d.gpuDelegate = new GpuDelegate(gpu_options);
                options.addDelegate(d.gpuDelegate);
            }
            d.tfliteModel = Utils.loadModelFile(assetManager, modelFilename);
            d.tfLite = new Interpreter(d.tfliteModel, options);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }

        d.isModelQuantized = isQuantized;
        // Pre-allocate buffers.
        int numBytesPerChannel;
        if (isQuantized) {
            numBytesPerChannel = 1; // Quantized
        } else {
            numBytesPerChannel = 4; // Floating point
        }
        d.INPUT_SIZE = inputSize;
        d.imgData = ByteBuffer.allocateDirect(1 * d.INPUT_SIZE * d.INPUT_SIZE * 3 * numBytesPerChannel);
        d.imgData.order(ByteOrder.nativeOrder());
        d.intValues = new int[d.INPUT_SIZE * d.INPUT_SIZE];

        d.output_box = (int) ((Math.pow((inputSize / 32), 2) + Math.pow((inputSize / 16), 2) + Math.pow((inputSize / 8), 2)) * 3);
//        d.OUTPUT_WIDTH = output_width;
//        d.MASKS = masks;
//        d.ANCHORS = anchors;
        if (d.isModelQuantized){
            Tensor inpten = d.tfLite.getInputTensor(0);
            d.inp_scale = inpten.quantizationParams().getScale();
            d.inp_zero_point = inpten.quantizationParams().getZeroPoint();
            Tensor oupten = d.tfLite.getOutputTensor(0);
            d.oup_scale = oupten.quantizationParams().getScale();
            d.oup_zero_point = oupten.quantizationParams().getZeroPoint();
        }

        int[] shape = d.tfLite.getOutputTensor(0).shape();
        int numClass = shape[shape.length - 1] - 5;
        d.numClass = numClass;
        d.outData = ByteBuffer.allocateDirect(d.output_box * (numClass + 5) * numBytesPerChannel);
        d.outData.order(ByteOrder.nativeOrder());
        return d;
    }

    public int getInputSize() {
        return INPUT_SIZE;
    }
    @Override
    public void enableStatLogging(final boolean logStats) {
    }

    @Override
    public String getStatString() {
        return "";
    }

    @Override
    public void close() {
        tfLite.close();
        tfLite = null;
        if (gpuDelegate != null) {
            gpuDelegate.close();
            gpuDelegate = null;
        }
        if (nnapiDelegate != null) {
            nnapiDelegate.close();
            nnapiDelegate = null;
        }
        tfliteModel = null;
    }

    public void setNumThreads(int num_threads) {
        if (tfLite != null) tfLite.setNumThreads(num_threads);
    }

    @Override
    public void setUseNNAPI(boolean isChecked) {
//        if (tfLite != null) tfLite.setUseNNAPI(isChecked);
    }

    private void recreateInterpreter() {
        if (tfLite != null) {
            tfLite.close();
            tfLite = new Interpreter(tfliteModel, tfliteOptions);
        }
    }

    public void useGpu() {
        if (gpuDelegate == null) {
            gpuDelegate = new GpuDelegate();
            tfliteOptions.addDelegate(gpuDelegate);
            recreateInterpreter();
        }
    }

    public void useCPU() {
        recreateInterpreter();
    }

    public void useNNAPI() {
        nnapiDelegate = new NnApiDelegate();
        tfliteOptions.addDelegate(nnapiDelegate);
        recreateInterpreter();
    }

    @Override
    public float getObjThresh() {
        return MainActivity.MINIMUM_CONFIDENCE_TF_OD_API;
    }

    private static final Logger LOGGER = new Logger();

    // Float model
    private final float IMAGE_MEAN = 0;

    private final float IMAGE_STD = 255.0f;

    //config yolo
    private int INPUT_SIZE = -1;

//    private int[] OUTPUT_WIDTH;
//    private int[][] MASKS;
//    private int[] ANCHORS;
    private  int output_box;

    private static final float[] XYSCALE = new float[]{1.2f, 1.1f, 1.05f};

    private static final int NUM_BOXES_PER_BLOCK = 3;

    // Number of threads in the java app
    private static final int NUM_THREADS = 1;
    private static boolean isNNAPI = false;
    private static boolean isGPU = false;

    private boolean isModelQuantized;

    /** holds a gpu delegate */
    GpuDelegate gpuDelegate = null;
    /** holds an nnapi delegate */
    NnApiDelegate nnapiDelegate = null;

    /** The loaded TensorFlow Lite model. */
    private MappedByteBuffer tfliteModel;

    /** Options for configuring the Interpreter. */
    private final Interpreter.Options tfliteOptions = new Interpreter.Options();

    // Config values.

    // Pre-allocated buffers.
    private Vector<String> labels = new Vector<String>();
    private int[] intValues;

    private ByteBuffer imgData;
    private ByteBuffer outData;

    private Interpreter tfLite;
    private float inp_scale;
    private int inp_zero_point;
    private float oup_scale;
    private int oup_zero_point;
    private int numClass;
    private YoloV5Classifier() {

    }

    //non maximum suppression
    protected ArrayList<Recognition> nms(ArrayList<Recognition> list) {
        ArrayList<Recognition> nmsList = new ArrayList<Recognition>();

        for (int k = 0; k < labels.size(); k++) {
            //1.find max confidence per class
            PriorityQueue<Recognition> pq =
                    new PriorityQueue<Recognition>(
                            50,
                            new Comparator<Recognition>() {
                                @Override
                                public int compare(final Recognition lhs, final Recognition rhs) {
                                    // Intentionally reversed to put high confidence at the head of the queue.
                                    return Float.compare(rhs.getConfidence(), lhs.getConfidence());
                                }
                            });

            for (int i = 0; i < list.size(); ++i) {
                if (list.get(i).getDetectedClass() == k) {
                    pq.add(list.get(i));
                }
            }

            //2.do non maximum suppression
            while (pq.size() > 0) {
                //insert detection with max confidence
                Recognition[] a = new Recognition[pq.size()];
                Recognition[] detections = pq.toArray(a);
                Recognition max = detections[0];
                nmsList.add(max);
                pq.clear();

                for (int j = 1; j < detections.length; j++) {
                    Recognition detection = detections[j];
                    RectF b = detection.getLocation();
                    if (box_iou(max.getLocation(), b) < mNmsThresh) {
                        pq.add(detection);
                    }
                }
            }
        }
        return nmsList;
    }

    protected float mNmsThresh = 0.6f;

    protected float box_iou(RectF a, RectF b) {
        return box_intersection(a, b) / box_union(a, b);
    }

    protected float box_intersection(RectF a, RectF b) {
        float w = overlap((a.left + a.right) / 2, a.right - a.left,
                (b.left + b.right) / 2, b.right - b.left);
        float h = overlap((a.top + a.bottom) / 2, a.bottom - a.top,
                (b.top + b.bottom) / 2, b.bottom - b.top);
        if (w < 0 || h < 0) return 0;
        float area = w * h;
        return area;
    }

    protected float box_union(RectF a, RectF b) {
        float i = box_intersection(a, b);
        float u = (a.right - a.left) * (a.bottom - a.top) + (b.right - b.left) * (b.bottom - b.top) - i;
        return u;
    }

    protected float overlap(float x1, float w1, float x2, float w2) {
        float l1 = x1 - w1 / 2;
        float l2 = x2 - w2 / 2;
        float left = l1 > l2 ? l1 : l2;
        float r1 = x1 + w1 / 2;
        float r2 = x2 + w2 / 2;
        float right = r1 < r2 ? r1 : r2;
        return right - left;
    }

    protected static final int BATCH_SIZE = 1;
    protected static final int PIXEL_SIZE = 3;

    /**
     * Writes Image data into a {@code ByteBuffer}.
     */
    protected ByteBuffer convertBitmapToByteBuffer(Bitmap bitmap) {
//        ByteBuffer byteBuffer = ByteBuffer.allocateDirect(4 * BATCH_SIZE * INPUT_SIZE * INPUT_SIZE * PIXEL_SIZE);
//        byteBuffer.order(ByteOrder.nativeOrder());
//        int[] intValues = new int[INPUT_SIZE * INPUT_SIZE];
        bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());
        int pixel = 0;

        imgData.rewind();
        for (int i = 0; i < INPUT_SIZE; ++i) {
            for (int j = 0; j < INPUT_SIZE; ++j) {
                int pixelValue = intValues[i * INPUT_SIZE + j];
                if (isModelQuantized) {
                    // Quantized model
                    imgData.put((byte) ((((pixelValue >> 16) & 0xFF) - IMAGE_MEAN) / IMAGE_STD / inp_scale + inp_zero_point));
                    imgData.put((byte) ((((pixelValue >> 8) & 0xFF) - IMAGE_MEAN) / IMAGE_STD / inp_scale + inp_zero_point));
                    imgData.put((byte) (((pixelValue & 0xFF) - IMAGE_MEAN) / IMAGE_STD / inp_scale + inp_zero_point));
                } else { // Float model
                    imgData.putFloat((((pixelValue >> 16) & 0xFF) - IMAGE_MEAN) / IMAGE_STD);
                    imgData.putFloat((((pixelValue >> 8) & 0xFF) - IMAGE_MEAN) / IMAGE_STD);
                    imgData.putFloat(((pixelValue & 0xFF) - IMAGE_MEAN) / IMAGE_STD);
                }
            }
        }
        return imgData;
    }

    public ArrayList<Recognition> recognizeImage(Bitmap bitmap) {
        ByteBuffer byteBuffer_ = convertBitmapToByteBuffer(bitmap);

        Map<Integer, Object> outputMap = new HashMap<>();

//        float[][][] outbuf = new float[1][output_box][labels.size() + 5];
        outData.rewind();
        outputMap.put(0, outData);
        Log.d("YoloV5Classifier", "mObjThresh: " + getObjThresh());

        Object[] inputArray = {imgData};
        tfLite.runForMultipleInputsOutputs(inputArray, outputMap);

        ByteBuffer byteBuffer = (ByteBuffer) outputMap.get(0);
        byteBuffer.rewind();

        ArrayList<Recognition> detections = new ArrayList<Recognition>();

        float[][][] out = new float[1][output_box][numClass + 5];
        Log.d("YoloV5Classifier", "out[0] detect start");
        for (int i = 0; i < output_box; ++i) {
            for (int j = 0; j < numClass + 5; ++j) {
                if (isModelQuantized){
                    out[0][i][j] = oup_scale * (((int) byteBuffer.get() & 0xFF) - oup_zero_point);
                }
                else {
                    out[0][i][j] = byteBuffer.getFloat();
                }
            }
            // Denormalize xywh
            for (int j = 0; j < 4; ++j) {
                out[0][i][j] *= getInputSize();
            }
        }
        for (int i = 0; i < output_box; ++i){
            final int offset = 0;
            final float confidence = out[0][i][4];
            int detectedClass = -1;
            float maxClass = 0;

            final float[] classes = new float[labels.size()];
            for (int c = 0; c < labels.size(); ++c) {
                classes[c] = out[0][i][5 + c];
            }

            for (int c = 0; c < labels.size(); ++c) {
                if (classes[c] > maxClass) {
                    detectedClass = c;
                    maxClass = classes[c];
                }
            }

            final float confidenceInClass = maxClass * confidence;
            if (confidenceInClass > getObjThresh()) {
                final float xPos = out[0][i][0];
                final float yPos = out[0][i][1];

                final float w = out[0][i][2];
                final float h = out[0][i][3];
                Log.d("YoloV5Classifier",
                        Float.toString(xPos) + ',' + yPos + ',' + w + ',' + h);

                final RectF rect =
                        new RectF(
                                Math.max(0, xPos - w / 2),
                                Math.max(0, yPos - h / 2),
                                Math.min(bitmap.getWidth() - 1, xPos + w / 2),
                                Math.min(bitmap.getHeight() - 1, yPos + h / 2));
                detections.add(new Recognition("" + offset, labels.get(detectedClass),
                        confidenceInClass, rect, detectedClass));
                Log.d("other classification", "other classification");

            }
        }

        Log.d("YoloV5Classifier", "detect end");
        final ArrayList<Recognition> recognitions = nms(detections);
//        final ArrayList<Recognition> recognitions = detections;
        return recognitions;
    }

    public boolean checkInvalidateBox(float x, float y, float width, float height, float oriW, float oriH, int intputSize) {
        // (1) (x, y, w, h) --> (xmin, ymin, xmax, ymax)
        float halfHeight = height / 2.0f;
        float halfWidth = width / 2.0f;

        float[] pred_coor = new float[]{x - halfWidth, y - halfHeight, x + halfWidth, y + halfHeight};

        // (2) (xmin, ymin, xmax, ymax) -> (xmin_org, ymin_org, xmax_org, ymax_org)
        float resize_ratioW = 1.0f * intputSize / oriW;
        float resize_ratioH = 1.0f * intputSize / oriH;

        float resize_ratio = resize_ratioW > resize_ratioH ? resize_ratioH : resize_ratioW; //min

        float dw = (intputSize - resize_ratio * oriW) / 2;
        float dh = (intputSize - resize_ratio * oriH) / 2;

        pred_coor[0] = 1.0f * (pred_coor[0] - dw) / resize_ratio;
        pred_coor[2] = 1.0f * (pred_coor[2] - dw) / resize_ratio;

        pred_coor[1] = 1.0f * (pred_coor[1] - dh) / resize_ratio;
        pred_coor[3] = 1.0f * (pred_coor[3] - dh) / resize_ratio;

        // (3) clip some boxes those are out of range
        pred_coor[0] = pred_coor[0] > 0 ? pred_coor[0] : 0;
        pred_coor[1] = pred_coor[1] > 0 ? pred_coor[1] : 0;

        pred_coor[2] = pred_coor[2] < (oriW - 1) ? pred_coor[2] : (oriW - 1);
        pred_coor[3] = pred_coor[3] < (oriH - 1) ? pred_coor[3] : (oriH - 1);

        if ((pred_coor[0] > pred_coor[2]) || (pred_coor[1] > pred_coor[3])) {
            pred_coor[0] = 0;
            pred_coor[1] = 0;
            pred_coor[2] = 0;
            pred_coor[3] = 0;
        }

        // (4) discard some invalid boxes
        float temp1 = pred_coor[2] - pred_coor[0];
        float temp2 = pred_coor[3] - pred_coor[1];
        float temp = temp1 * temp2;
        if (temp < 0) {
            Log.e("checkInvalidateBox", "temp < 0");
            return false;
        }
        if (Math.sqrt(temp) > Float.MAX_VALUE) {
            Log.e("checkInvalidateBox", "temp max");
            return false;
        }

        return true;
    }

}

OdedHellman commented 5 months ago

@safyyy, the YOLOv8 model conversion to TFLite is done outside of Android, usually on a more powerful machine. Once you've successfully converted the model, you can then include the resulting .tflite file in your Android project to carry out object detection. The conversion process involves exporting the YOLOv8 model weights to the ONNX format and then using the TensorFlow Lite Converter to create the TFLite model.

The steps to implement the TFLite model on Android would typically involve loading the TFLite model using the TFLite interpreter, preparing your image data as an input tensor, running inference on the model using the input tensor, and then extracting and interpreting the output tensor.

For the output tensor with the shape of [1, 9, 8400] from YOLOv8, it consists of 9 bounding box predictions for each of the 8400 grid cells of the image, which are more detailed than what you typically find. Processing these outputs will require you to decode the bounding box predictions, suppress extraneous detections with Non-Maximum Suppression (NMS) and apply further post-processing steps like thresholding to finalize the detected objects.

Please keep in mind, while the concepts might be similar, directly following the steps from YOLOv5's Android implementation wouldn't work for YOLOv8 due to differences in the architectural design and output of the models.

Remember, the specifics for how you implement these steps can depend on how your Android app is structured and the specific requirements of your app. I hope this information is helpful and I wish you luck with your project!

hey @glenn-jocher, how are you? I think it will be very helpful (for all developers) if you share with us an example android app that handle a simple task of loading, running and post processing a tflite version of yoloV8. Based on the Ultralytics hub application, I believe that the job was already done and you guys managed to handle a lot of our issues - just share with us the outcomes. Could you share a simple app with us?

glenn-jocher commented 5 months ago

Hello @OdedHellman,

I'm doing well, thank you for asking. I appreciate your interest in a YOLOv8 example for Android.

However, as the author and maintainer of the Ultralytics YOLOv8 repo, the focus is on providing state-of-the-art object detection models and tools for the community. While we may have internal applications and examples, our main repository and documentation is where we share general guidelines, tutorials, and updates relevant to the broader community.

Currently, we do not have an official simple example app specifically for YOLOv8 that we can share publicly. Developers often tailor their mobile applications based on the specific requirements of their projects and as such, we encourage the community to adapt the general principles outlined in our documentation to their use case.

We're pleased to hear that our work could help with your Android projects, and your suggestion is noted. It's feedback like yours that helps us understand the needs of the community and could inspire future content or public examples.

For direct assistance or specific feature requests like example Android apps, we typically recommend exploring our Enterprise services, where the Ultralytics team can engage in more directed support or development efforts.

The Ultralytics community is active and resourceful, and we often see community-driven initiatives. You might find it useful to engage with the community on platforms like GitHub or other forums to see if someone has shared a similar implementation which could serve as a reference for your work.

Thanks again for your support, and best of luck with your project!

ENNURSILA commented 5 months ago

https://medium.com/@gary.tsai.advantest/top-tutorials-for-deploying-custom-yolov8-on-android-%EF%B8%8F-dd6746afc1e6
I could only find this source. But, source codes are not like the android app in yolov5.

surendramaran commented 5 months ago

Check out this repository for running live object detection in android https://github.com/surendramaran/YOLOv8-TfLite-Object-Detector . Use any custom YOLOv8 object detection model.

glenn-jocher commented 5 months ago

@surendramaran thank you for sharing the repository link. While I can't endorse any external code, community resources like these can be incredibly valuable for developers looking to implement YOLOv8 in specific environments such as Android. Those interested in integrating a custom YOLOv8 model into an Android application may find such examples helpful as a reference or starting point. Remember to review the code and adapt it as necessary to fit your application's requirements and the specifics of your custom YOLOv8 model. Best of luck with your project! 🚀✨

LachhabMeriem commented 2 months ago

@glenn-jocher Thanks for your advice, after checking @surendramaran explanation about his output, I think I know where the problem is, it is based on the index of the array where I get the labels for my detection. In multiple detections, I need to change a little bit. Example in my case, with the labels: NC: 3 # number of classes names: ['motorbike', 'person', 'bicycle'] the 'person' class is on the second row so I need to change this code val cnf = array[c + NUM_ELEMENTS * 4] (in your case is for single detection) to this
 val indices = (4..(3+ LABEL_SIZE )).map { c + NUM_ELEMENTS * it } //  I have 3 classes so the LABEL_SIZE is 3
            val cnf_array = indices.map { array[it] }.toFloatArray()
            val cnf = cnf_array.maxOrNull() ?: continue
I'm not sure is the best way to do this or not but it works fine in my case what respresent the variable it

glenn-jocher commented 2 months ago

@LachhabMeriem hey there! 🌟 It's great to hear you've found a solution that works for your case. Your approach to dynamically adjust the indices based on the number of classes and then finding the maximum confidence score seems practical for handling multiple detections. The it in your lambda function represents the current element in the iteration, which in your context, is each value from 4 to 3 + LABEL_SIZE. This is a common Kotlin idiom for iterating over a range or collection.

Your method of extracting confidence scores for each class and then selecting the highest one is a solid strategy, especially when dealing with multiple classes. If it's working well for you, it sounds like you're on the right track! Just ensure to keep an eye on performance implications if your number of classes grows. Happy coding! 🚀

Ashish2000L commented 2 months ago

@surendramaran Thanks for the demo implementation for android. I have tried implementing your code in my current project which is in Java which works pretty well and predicting values with descent confidence. I have noticed that the value of boundingbox model are int float which is below 1f. Could you please enlighten me If I want to create bounding box on the bitmap image using canva how can I convert those decimal values to the actual coordinate in the image?

Below is the BoundingBox Value: cnf=0.6146 cx=0.51159793 cy=0.5604745 h=0.033230007 w=0.1646854 x1=0.42925525 x2=0.5939406 y1=0.5438595 y2=0.57708955

sila533 commented 2 days ago

Hello,

Has anyone tried it for yolov10 android or cross platform ?

ultralytics / ultralytics

How can I implement the YOLOv8 TFLite model with an output shape of [1, 9, 8400] in Android? #2950

Search before asking

Question

Additional

UPDATE

Previous