About Final Inference Time - Githubissues

mdenna-synaptics / codalab2022

4 stars 0 forks source link

About Final Inference Time #12

Open touristourist opened 2 years ago

touristourist commented 2 years ago

Hi, @mdenna-synaptics , What I want to determine is, is the time in the JSON file equals to the final inference time for the same two models? Thanks!

mdenna-synaptics commented 2 years ago

The time in the JSON file is measured by compiling model.tflite for our HW and running inference there. The same model will be used to compute accuracy. Which are the two models you mention in your question?

touristourist commented 2 years ago

The time in the JSON file is measured by compiling model.tflite for our HW and running inference there. The same model will be used to compute accuracy. Which are the two models you mention in your question?

Sorry, maybe I didn't express it clearly, I mean is the model.tflite 's latency time we get from the JSON file equal to our model's final inference time in the testing phase? Just want to make sure :）

mdenna-synaptics commented 2 years ago

Yes yes!

deepernewbie commented 2 years ago

Hello, If this is so last years method ABPN achieves 43-44ms in the current device, though it's been reported as 37ms last years report. Are there any hardware changes compared to last year, If not why there is the difference

Regards

CCjiahao commented 2 years ago

I hope there will be no changes, because all optimization is made based on this feedback.

deepernewbie commented 2 years ago

Then the results are not comparable. I hope they would make a clear statement on this issue

mdenna-synaptics commented 2 years ago

Hi, your remark is correct. The difference with last year timing is not due to a different HW or to a regression in the driver, but to the fact that we compiled the networks in a different way. For last year competition we compiled the networks in NCHW layout. This layout is more "HW-friendly", but it means that the conversion of the input image between NHWC (that is RGB, RGB, RGB...) to NCHW (that is RRRRRR...GGGGG.....BBBBB..) and the conversion of the output image from NCHW to NHWC has to be done in software. By compiling the networks in NHWC we basically move these conversions inside the network inference, so the increase in inference time that you observe. Since this additional overhead is present for all participants this should not influence the outcome of the competition.

deepernewbie commented 2 years ago

Thanks for the clarification, I wish this was mentioned way before so that we could compile our tflite files channel first and get rid of all these tensor transposes and increased time of reorganization layer, figuring this out IMHO is not science but engineering, I hope this engineering side of the challenge can be well adjusted among methods by organizers. and novelity is also awarded. Thank you for your efforts.

Regards

mdenna-synaptics commented 2 years ago

@deepernewbie I see your point, but the additional tensor transposes don't depend on the fact that you used NHWC or NCHW in the .tflite model, but on how we compiled it. Doing the layout conversion in the model slows down inference of course, but at the same time removes the need to do the same conversion is SW when the model is used, providing an overall improvement. As said this overhead is basically the same for all submissions so this will not influence the outcome of the competition. In any case to avoid doubts I will recompile all the models in NCHW and provide the new top-10 inference times in the README. I hope this makes things more clear. Best regards

deepernewbie commented 2 years ago

Thank you for clearing this out as I have pointed out my model which takes 30ms in NCHW now only takes 8ms in NHWC. Improvement is true for most of the models but the time delay/gain effecting all of the models are not the same.

Anyways everything is clearer right now keep up the quality work and thank you for your efforts

Best

touristourist commented 2 years ago

@mdenna-synaptics So, which layout will be used in final inference score?

deepernewbie commented 2 years ago

IMHO to be fair to all , the Challenge should be seperated to two tracks with NCHW and NHWC models and evaluate seperately. This is only an addition of a table. Best of both times can also be used. But I have the cofidence to the organizers since they have been fairly organizing many challenges maybe over 5 years, This is why I attend those challenges and put my proffesional working hours.

Good luck

touristourist commented 2 years ago

IMHO, it's unreasonable to use any layout different from that used in development and testing phase, because all the optimization methods are designed based on these two phases. Optimizing in one layout, while finally measuring in another layout, it sounds unfair and makes no sense.

Best regards

mdenna-synaptics commented 2 years ago

The final inference score will be based on the NHWC timings. All the models have been tested and optimized based on this metric.

In any case we will also recompute the ranking using the NCHW timing and see if some model achieves a substantially better score. In this case we will revise our optimization algorithms and see how to acknowledge that.