有没有关于视频方面的研究

Siq1982 commented 3 years ago

最近才开始神经网络方面的工作，有许多不太明白的地方。但我尝试了许多分割方面的模型。我发现U2Net效果上应该是最好的。但我无法确定它的实际速度应该是什么样的情况。我用的是MacBookPro, 没有CUDA的支持。我测试推理过程大概消耗一张图片是0.6秒左右。然后我把它平移到了CoreML。据苹果描述它是有GPU优化的，但我实际测试结果也是差不多0.4~0.6秒左右，没有明显变化。也不知道CoreML到底有没有GPU上的加速。因为我导出的模型和数据pth是基于cpu导出的。我不太知道它的内部工作机制。所以在视频处理上我不知道它是否可胜任，比如达到30fps左右。我的困惑在于，如果我导出的pth是基于cpu的，那么基它加载系统有无可能自动分解成GPU的代码和数据？比如CoreML 另外我查阅了一些资料，大多数视频分割都把前一帧的MASK作为当前帧的其中一个输入，以便利用时序上的信息.我很困惑，这个MASK到底是用来提高分割速度的，还是用来提高分割精度的。另外就是U2net本身有没有一些视频分割方面的尝试。

Siq1982 commented 3 years ago

我使用了Colab测试了CUDA下的情况，在Colab GPU下大概是17fps左右。然后我用在colab下导出带CUDA的pth，然后用coreml工具转换成CoreML. 测试仍然是0.5秒一帧左右。看起来和之前用CPU方式导出的结果没有差别。

Siq1982 commented 3 years ago

另外U2net没有使用提议框方式，是不是不太好使用Previous mask对框架进行加速？

xuebinqin commented 3 years ago

you can crop the image based on the predicted bounding boxes and then feed the cropped images to U-2-Net. It should be faster than feeding the whole image into network. As for the speed on cpu, I haven't tested it systematically. But these seems some other guys are runnin that with real-time: https://www.linkedin.com/posts/andreascuderi_machinelearning-iphone-swift-activity-6752303661705170944-aI-T. You can search on google or communicate with these guys.

On Tue, Feb 16, 2021 at 12:37 AM Siq1982 notifications@github.com wrote:

另外U2net没有使用提议框方式，是不是不太好使用Previous mask对框架进行加速？

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NathanUA/U-2-Net/issues/163#issuecomment-779647718, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORPPH2OCFD4RJOOCRRDS7IODHANCNFSM4XPMXJSA .

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

xiexie123 commented 3 years ago

you can crop the image based on the predicted bounding boxes and then feed the cropped images to U-2-Net. It should be faster than feeding the whole image into network. As for the speed on cpu, I haven't tested it systematically. But these seems some other guys are runnin that with real-time: https://www.linkedin.com/posts/andreascuderi_machinelearning-iphone-swift-activity-6752303661705170944-aI-T. You can search on google or communicate with these guys. … On Tue, Feb 16, 2021 at 12:37 AM Siq1982 @.***> wrote: 另外U2net没有使用提议框方式，是不是不太好使用Previous mask对框架进行加速？ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#163 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORPPH2OCFD4RJOOCRRDS7IODHANCNFSM4XPMXJSA . -- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

but I see in u2net_human_seg_test.py, the input image is rescaled to (320*320), namely either the whole image or the cropped image will not change the size of data feed into net as well as the computations of inference. Am I right?

xuebinqin commented 3 years ago

Yes, that's true. But the resizing operations before and after the inference will influence the total time costs. Because the resizing (e.g. bilinear）takes a lot of time for those large size original images.

On Thu, May 13, 2021 at 11:58 AM xiexie123 @.***> wrote:

you can crop the image based on the predicted bounding boxes and then feed the cropped images to U-2-Net. It should be faster than feeding the whole image into network. As for the speed on cpu, I haven't tested it systematically. But these seems some other guys are runnin that with real-time: https://www.linkedin.com/posts/andreascuderi_machinelearning-iphone-swift-activity-6752303661705170944-aI-T. You can search on google or communicate with these guys. … <#m4409778832169561450> On Tue, Feb 16, 2021 at 12:37 AM Siq1982 @.***> wrote: 另外U2net没有使用提议框方式，是不是不太好使用Previous mask对框架进行加速？ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#163 (comment) https://github.com/xuebinqin/U-2-Net/issues/163#issuecomment-779647718>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORPPH2OCFD4RJOOCRRDS7IODHANCNFSM4XPMXJSA . -- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

but I see in u2net_human_seg_test.py, the input image is rescaled to (320*320), namely either the whole image or the cropped image will not change the size of data feed into net as well as the computations of inference. Am I right?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xuebinqin/U-2-Net/issues/163#issuecomment-840390795, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORJFCHBUNWFVWVU6ZVTTNOBDDANCNFSM4XPMXJSA .

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage:https://webdocs.cs.ualberta.ca/~xuebin/

xuebinqin / U-2-Net

有没有关于视频方面的研究 #163