hollance / Forge

A neural network toolkit for Metal
MIT License
1.27k stars 173 forks source link

Regarding offset for picking values for bounding box values #43

Closed Rathna21 closed 5 years ago

Rathna21 commented 5 years ago

Hello,

I am trying to work on YOLO in Windows ML. Initially I converted the darknet Yolo v2 tiny model to keras using yad2k script. And used keras2onnx converter to convert from keras to ONNX.

So, model is successfully converted to ONNX with output shape as NHWC ( 13 x 13 x 125 ). Now I have to generate bounding boxes for which I tried referring your code for OFFSET but I see "Array Index: Out of bound exception". I think this is because you have 128 channels in swift and in windows ML its just 125.

So, how can I handle this?

Could you please help me on this?

hollance commented 5 years ago

Sorry but I don’t know anything about Windows ML.

Rathna21 commented 5 years ago

@hollance Thanks but could you please explain your offset function?

Why do you have 128 channels? If swift had 125 channels, how would it be?

Because I am new in handling arrays and its quite difficult. So, just can you tell me if swift had 125 channels instead of 128 how would your offset be?

Its little difficult to understand that. Not Windows ML just in general. Would be greatful if you can help with this.

hollance commented 5 years ago

You are talking about this function from YOLO.swift, am I correct?

    func offset(_ channel: Int, _ x: Int, _ y: Int) -> Int {
      let slice = channel / 4
      let indexInSlice = channel - slice*4
      let offset = slice*gridHeight*gridWidth*4 + y*gridWidth*4 + x*4 + indexInSlice
      return offset
    }

This is very specific to how Metal organizes things in memory. We have 128 channels because the number of channels (125) is rounded up to the next multiple of 4. That's just how Metal works.

If your result tensor is organized in memory as [channels][height][width] for example, you could do the following:

    channelStride = height * width
    heightStride = width
    widthStride = 1

    func offset(_ channel: Int, _ x: Int, _ y: Int) -> Int {
      return channel*channelStride + y*heightStride + x*widthStride
    }

I hope that makes sense.

Rathna21 commented 5 years ago

@hollance Thanks for the reply.

My memory is organized as [height][width][channels] . So, how would my offset function be ? Its the other way of what example you have sent.

Could you give an example for this?

hollance commented 5 years ago

Change the stride variables appropriately.

Rathna21 commented 5 years ago

@hollance Yes it worked. Thank you so much.