hollance / YOLO-CoreML-MPSNNGraph

Tiny YOLO for iOS implemented using CoreML but also using the new MPS graph API.
MIT License
933 stars 252 forks source link

Route and Reorg layers with MPSNNGraph #20

Closed CorgiSpectre closed 6 years ago

CorgiSpectre commented 6 years ago

Hi, Thanks for the great writeup and tutorial. I was trying to implement full V2 version of YOLO with 30 layers . The previous way, by using MPSTemporaryImages, we would use the offset in source and destinationFeatureOffset parameters for the reorg and route layers(concatenation). But in the MPSNNGraph, with each layer being a MPSCNNConvolutionNode or MPSCNNPoolingMaxNode , there is no way to specify these params for these nodes. So would the solution be to add MPSTemporaryImages inbetween these layers whenever needed to create the same effect as the route and reorg layers?

hollance commented 6 years ago

With the graph API you can use MPSNNConcatenationNode to perform the concatenation. You don't have to set the destinationFeatureChannelOffset yourself.

CorgiSpectre commented 6 years ago

Thanks for the quick reply. Ah yes, MPSNNConcatenationNode can be used for the route layer, but how can the reorg layer be implemented ? Is it possible to introduce reorg layers by using MPSTemporary images inbetween the nodes of the graph API like I described? For the reorg layer with stride 2, we essentially have to dive the activations from the previous conv layer into 4 parts and stack them end to end, don't know how this can be accomplished with graph API or if its even possible

hollance commented 6 years ago

I'm not really sure what the reorg layer does. But if you cannot achieve this with graph nodes, then you cannot use MPSNNGraph and you'll have to use the MPSCNN kernels directly.

CorgiSpectre commented 6 years ago

Reorg basically crops and concatenates the activations, so if you have a 26x26x64 output from a conv layer, after reorg, it will be flattened and rearranged as 13x13x256. After looking into this a bit more, it looks like MPSNNGraph wont work for this and I will have to use MPSCNN like you said, let me know if you have any suggestions, thanks!

CorgiSpectre commented 6 years ago

In case anyone is stuck in the same position I was in. I solved it and implemented the reorg layer by setting the padding policy just like for the pool6 layer, but for the MPSCNNConvolutionNodes , you get access to the underlying MPSCNNConvolution object and you can set the offset and clipRect properties. You can then use the MPSNNConcatenationNode to concatenate the outputs