Run the ImageClassification example have some question

linfeng886 commented 4 years ago

  When I run the full quantization model of mobilenet, the current CPU platform is mtk8163. At present, I find a very strange phenomenon. When I limit the CPU number to 2 cores, I run the image classfication. Get the result cost the about the 1.3s, but if I don't limit the number of CPU cores, the cores nums is 4 . Run the same example and get the result only cost less than 0.1s. Whether  the pytorch mobile net been specially optimized for 4 cores?
   We feel confused, because at present we plan to migrate our PSE network based on Python to the framework of Python mobile net. Considering the problem of power consumption, the current situation is that only two cores can be opened. I hope to get your enthusiastic answer.

IvanKobzarev commented 4 years ago

Hello @linfeng886, We also noticed problems with our multi threading mode on some devices. At the moment our thread count is fixed by device to 'Number of big cores' on device. On some devices (I tested on nexus 5) I saw a lot of time spent in threads contention and singleThread mode for those devices was more efficient.

I think, our mobile net was not deliberately optimized for some specific number of threads/cores (I will try to get in the loop people who worked on it to provide more details)

Did I get right, that for your case limiting to 2 cores is more optimal as it gives you wins on battery consumption without big loses on inference time? Is this difference in power consumption on specifically running pytorch mobile or just general difference 2 cores vs 4 cores?

PS: We plan to improve our threading model to be more efficient with choosing number of threads. What we are thinking to add soon - to expose control on threads count to java level to be able to set for example "singleThread" mode.

linfeng886 commented 4 years ago

Dear Ivan Kobzarev, Thank you for your quick reply. Did I get right, that for your case limiting to 2 cores is more optimal as it gives you wins on battery consumption without big loses on inference time? Is this difference in power consumption on specifically running pytorch mobile or just general difference 2 cores vs 4 cores? --------------------------------> Because of the mtk8163 platform, the CPU cores are four A53. When we ran the inference algorithm on the tflite framework, we found that the Dual core takes only about 30% less time than four core . But the current power consumption is less than half of the 4-core. And our device is the AI offline wearable device,we need to strictly control power consumption . Finally, we choose the dual core. Can you tell me more info about the multithreading model of this framework? Or your colleagues. At the same time, if there is a new version, when can I release it. We would like to do some research on the new framework .

------------------ 原始邮件 ------------------ 发件人: "Ivan Kobzarev"<notifications@github.com>; 发送时间: 2019年12月11日(星期三) 凌晨4:28 收件人: "pytorch/android-demo-app"<android-demo-app@noreply.github.com>; 抄送: "凌峰"<273540747@qq.com>;"Mention"<mention@noreply.github.com>; 主题: Re: [pytorch/android-demo-app] Run the ImageClassification example have some question (#43)

Hello @linfeng886, We also noticed problems with our multi threading mode on some devices. At the moment our thread count is fixed by device to 'Number of big cores' on device. On some devices (I tested on nexus 5) I saw a lot of time spent in threads contention and singleThread mode for those devices was more efficient.

I think, our mobile net was not deliberately optimized for some specific number of threads/cores (I will try to get in the loop people who worked on it to provide more details)

Did I get right, that for your case limiting to 2 cores is more optimal as it gives you wins on battery consumption without big loses on inference time? Is this difference in power consumption on specifically running pytorch mobile or just general difference 2 cores vs 4 cores?

PS: We plan to improve our threading model to be more efficient with choosing number of threads. What we are thinking to add soon - to expose control on threads count to java level to be able to set for example "singleThread" mode.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

IvanKobzarev commented 4 years ago

At the moment number of threads by default is determined by device and is about device BIG cores number. More details about number of threads per device you can find in the code of function caffe2::ThreadPool::defaultThreadPool() https://github.com/pytorch/pytorch/blob/master/caffe2/utils/threadpool/ThreadPool.cc#L24

We just exposed control on global number of threads used by pytorch android.

org.pytorch.PyTorchAndroid#setNumThreads(int numThreads)

https://github.com/pytorch/pytorch/blob/master/android/pytorch_android/src/main/java/org/pytorch/PyTorchAndroid.java#L33

linfeng886 commented 4 years ago

HI Ivan , At present,we have implemented the corresponding PSE network on the pytorch mobile net. At the same time, we have enfront some issue ,please see the below table .

Our platform is MTK mt8163 platform , which has 4 CPU cores, the cpu core is the A53. Now we have done some test for it . In the case of 2-core and 3-core test , we have done the quantification, and the effect is obvious,the speed is faster that not float32 model. But when we run on 4-core, we find that the quantification is slower than the non quantification. Whether it is related to the current multithreading framework of pytorch's mobile net ?

Thank you very much for your reply！

------------------ 原始邮件 ------------------ 发件人: "Ivan Kobzarev"<notifications@github.com>; 发送时间: 2019年12月12日(星期四) 上午7:35 收件人: "pytorch/android-demo-app"<android-demo-app@noreply.github.com>; 抄送: "凌峰"<273540747@qq.com>;"Mention"<mention@noreply.github.com>; 主题: Re: [pytorch/android-demo-app] Run the ImageClassification example have some question (#43)

At the moment number of threads by default is determined by device and is about device BIG cores number. More details about number of threads per device you can find in the code of function caffe2::ThreadPool::defaultThreadPool() https://github.com/pytorch/pytorch/blob/master/caffe2/utils/threadpool/ThreadPool.cc#L24

We just exposed control on global number of threads used by pytorch android, it was landed in master method org.pytorch.Module#setNumThreads(int numThreads)
(https://github.com/pytorch/pytorch/blob/master/android/pytorch_android/src/main/java/org/pytorch/Module.java#L57) The latest android nightlies already include them: https://github.com/pytorch/pytorch/tree/master/android#nightly Module module = Module.load(moduleFileAbsoluteFilePath); module.setNumThreads(1);
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

pytorch / android-demo-app

Run the ImageClassification example have some question #43