tucan9389 / PoseEstimation-CoreML

The example project of inferencing Pose Estimation using Core ML
https://github.com/motlabs/awesome-ml-demos-with-ios
MIT License
680 stars 136 forks source link

Understanding performance test #41

Open yonatanbitton opened 3 years ago

yonatanbitton commented 3 years ago

Hello. Thanks for the amazing repository!

I built the project and it works great.

I am trying to understand the performance test. I would be glad if you can elaborate about what was measured.

Looking at Measure.swift:

func 🎬🤚() {
        🏷(for: index, with: "end")

        let beforeMeasurement = getBeforeMeasurment(for: index)
        let currentMeasurement = measurements[index]
        if let startTime = currentMeasurement["start"],
            let endInferenceTime = currentMeasurement["endInference"],
            let endTime = currentMeasurement["end"],
            let beforeStartTime = beforeMeasurement["start"] {
            delegate?.updateMeasure(inferenceTime: endInferenceTime - startTime,
                                    executionTime: endTime - startTime,
                                    fps: Int(1/(startTime - beforeStartTime)))
        }

    }

startTime is the moment that the image is received. It is a pixelBuffer, (640x480) image. endInferenceTime is the moment when the inference was stopped, receiving a (14, 96, 96) heatmap from the CPM model. beforeStartTime is the start time of the previous frame. Why do we measure it?

If so I understand what is:

But I don't understand what is fps: Int(1/(startTime - beforeStartTime))) - why is the beforeStartTime relevant?

To summarize, my questions are:

  1. Did I understand correctly the inferenceTime and executionTime?
  2. Why is the fps calculated with beforeStartTime?
  3. Why do you have moving average filter in the post-process code? Is it instead the gaussian_filter in the original CPM code?

Thank you very much.

tucan9389 commented 3 years ago

@yonatanbitton

startTime is the moment that the image is received. It is a pixelBuffer, (640x480) image. endInferenceTime is the moment when the inference was stopped, receiving a (14, 96, 96) heatmap from the CPM model. beforeStartTime is the start time of the previous frame. Why do we measure it?

Exactly right

  • inferenceTime: endInferenceTime - startTime - from pixelBuffer, (640x480) image to (14, 96, 96) heatmap .

Actually in our inferenceTime, it includes not only inference step but also pre-processing step because Apple's Vision framework is worked as the way.

  • executionTime: endTime - startTime - from pixelBuffer, (640x480), including post-process (convertToPredictedPoints, and moving average filter. BTW - what is this filter?)

The output of CoreML inference is heatmaps (not points), so we should convert it into points. Including converting into points with inferenceTime is executionTime. So I think your explanation is right.

moving average filter. BTW - what is this filter?

This is a kind of smoothing method in signal processing. You can check the wiki(https://en.wikipedia.org/wiki/Moving_average if you want more detail!

So..

  1. Did I understand correctly the inferenceTime and executionTime?

Yes

  1. Why is the fps calculated with beforeStartTime?

For considering the camera video's fps. I set the fps of camera under 60. So the real fps shouldn't be under 60. And I wanted to check the real fps.

  1. Why do you have moving average filter in the post-process code? Is it instead the gaussian_filter in the original CPM code?

Oh, you also saw the original code! At that time, I had three thinks.

  1. The gaussian filter needs a lots of calculations in mobile device
  2. I want to implement guassian filter without OpenCV but with Accelerate or Metal framework. But I was busy a little bit.
  3. There is a low resolution issue in the model.

I hope you helpful.

yonatanbitton commented 3 years ago

Thank you for the answer :-)

Regarding the FPS:

I’d like to understand the relation between ‘Total Time’ and ‘FPS’ described in the tables here. Taking cpm/11-pro as an example, the total runtime is 23 msec (~43 fps) but the FPS for that setting is 15. You mentioned that you limited the camera to 60 FPS, so why don’t you get an FPS of 43 = min(43, 60) in this case? Is it because of background tasks running on the device, or a different reason?