hollance / Forge

A neural network toolkit for Metal
MIT License
1.27k stars 172 forks source link

Results of MPSCNNConvlotion #44

Closed xta0 closed 4 years ago

xta0 commented 4 years ago

Hi, I have a following float array as an input buffer for MPSImage

let buffer4c = [
// R    G     B   A    R    G     B     A
1.0, 0.0, 0.0, 1.0,  1.0, 0.0, 0.0, 1.0, 
// R    G     B   A    R    G     B     A
1.0, 0.0, 0.0, 1.0,  1.0, 0.0, 0.0, 1.0, 
]

From my understanding, this should represent a 2x2x3 tensor whose the 4th channel is padded as 1.0. Then I created a MPSImage object using that buffer via the category method defined in MPSImage+Floats.swift

inputImg  = MPSImage(device: device,
                                             numberOfImages: 1,
                                             width: 2,
                                             height: 2,
                                             featureChannels: 3,
                                             array: &buffer4c,
                                             count: 2*2*4)

After that, I created a weight buffer whose dimension is 1x3x2x2(NCHW). I understand this needs to be converted to NHWC. To make things easier, I set all values in the buffer to 1.0

nums = [1.0,1.0,1.0, 1.0,1.0,1.0,
                1.0,1.0,1.0, 1.0,1.0,1.0]

The last step is to setup the convolution, here is what I did

class Conv2d : NeuralNetwork {
    typealias PredictionType = Float16

    var inputImg: MPSImage!
    var outputImg: MPSImage!
    var oid = MPSImageDescriptor(channelFormat: .float16, width: 1, height: 1, featureChannels: 1)
    var conv2d: MPSCNNConvolution

    init(device: MTLDevice, inflightBuffers: Int) {
        weightsLoader   = { name, count in ParameterLoaderBundle(name: name, count: count, suffix: "_W", ext: "txt") }
        outputImg       = MPSImage(device: device, imageDescriptor: oid)
        conv2d          = convolution(device: device, kernel: (2, 2), inChannels: 3, outChannels: 1, activation: nil, name: "conv", useBias: false)
    }

    func encode(commandBuffer: MTLCommandBuffer, texture: MTLTexture, inflightIndex: Int) {
        conv2d.encode(commandBuffer: commandBuffer, sourceImage: inputImg, destinationImage: outputImg)
    }
    func fetchResult(inflightIndex: Int) -> NeuralNetworkResult<Float16> {
        let probabilities = outputImg.toFloatArray()
        print(probabilities)
        return NeuralNetworkResult<Float16>()
    }
}

From my understanding, the result of the convolution should be 4.0 (I aslo verified using pytorch). However, the output was 1.0. I experimented a little bit, seems like only the first 4 elements of image buffer get multiplied with the corresponding weights.

Is there anything that I'm missing here?

hollance commented 4 years ago

Not sure what's going on here, it might be a padding issue. But note that Forge is not supported anymore, so unfortunately I can't help you figure this out.

xta0 commented 4 years ago

Not sure what's going on here, it might be a padding issue. But note that Forge is not supported anymore, so unfortunately I can't help you figure this out.

You mean the padding to the input MPSImage? I understand Forge is not maintained anymore, but could you please shed some light on this issue? It bothered me a while.

Correct me if I'm wrong, when creating a MPSImage object from float buffer, the buffer needs to serialized as a 1d array - [R,G,B,A, R,G,B,A,...,R,G,B,A]. Is that correct?

In this case, my inputs are 2x2 images with 4 channels,

C1 C1
C1 C1

C2 C2
C2 C2

C3 C3
C3 C3

C4 C4
C4 C4

Then I permute the tensor, to make it C4HW

C1C2C3C4 C1C2C3C4
C1C2C3C4 C1C2C3C4 

Then I serialized it to 1d array

C1C2C3C4 C1C2C3C4 C1C2C3C4 C1C2C3C4

This is the final float buffer I use to create the MPSImage. Wondering if this is making sense to you.

Really appreciate your help!

hollance commented 4 years ago

I meant the padding rules used by the convolution kernel. If you rewrite your code using the regular MPSCNN APIs, I don't mind taking a look. But I'm not interested in digging through the Forge code to figure out how it worked again, since it's been a long time that I looked at it.

xta0 commented 4 years ago

The following code uses the regular MPSCNN APIs to do a single conv2d operation. The input is a [1,3,2,2] tensor. I converted it to a MPSImage object where the alpha channel is padded as 1.0

 var buffer4c: [Float] = [
            1.0,0.0,0.0,1.0, //R,G,B,A
            1.0,0.0,0.0,1.0, //R,G,B,A
            1.0,0.0,0.0,1.0, //R,G,B,A
            1.0,0.0,0.0,1.0  //R,G,B,A
        ]
        let inputImage: MPSImage! = MPSImage(device: device,
                                                   numberOfImages: 1,
                                                   width: 2,
                                                   height: 2,
                                                   featureChannels: 3,
                                                   array: &buffer4c,
                                                   count: 2*2*4)

The weigth is a [1,3,2,2]tensor with all elements set to 1.0. The output is a [1,1,1,1] tensor whose value should be 4.0. However, the following code gives me 0.0. Could you help find out what I did wrong? Thank you so much.

class Conv2dDataSource: NSObject, MPSCNNConvolutionDataSource {
    var device: MTLDevice
    let name: String
    let kernelWidth: Int
    let kernelHeight: Int
    let inputFeatureChannels: Int
    let outputFeatureChannels: Int
    var pointer: UnsafeMutableRawPointer!

    init(_ device: MTLDevice, _ name: String, _ kernelWidth: Int, _ kernelHeight: Int,
         _ inputFeatureChannels: Int, _ outputFeatureChannels: Int) {
        self.device = device
        self.name = name
        self.kernelWidth = kernelWidth
        self.kernelHeight = kernelHeight
        self.inputFeatureChannels = inputFeatureChannels
        self.outputFeatureChannels = outputFeatureChannels
    }

    func dataType() -> MPSDataType {
        return .float16
    }

    func descriptor() -> MPSCNNConvolutionDescriptor {
        let convDesc = MPSCNNConvolutionDescriptor(kernelWidth: self.kernelWidth,
                                                   kernelHeight: self.kernelHeight,
                                                   inputFeatureChannels: self.inputFeatureChannels,
                                                   outputFeatureChannels: self.outputFeatureChannels)
        return convDesc
    }

    func weights() -> UnsafeMutableRawPointer {
        return pointer
    }

    func biasTerms() -> UnsafeMutablePointer<Float>? {
        return nil
    }

    func load() -> Bool {
        //        var fp32:[Float] = [
        //             1.0,1.0,1.0,1.0, 1.0,1.0,1.0,1.0, 1.0,1.0,1.0,1.0, 1.0,1.0,1.0,1.0,
        //             0.0,0.0,0.0,0.0, 0.0,0.0,0.0,0.0, 0.0,0.0,0.0,0.0, 0.0,0.0,0.0,0.0,
        //             0.0,0.0,0.0,0.0, 0.0,0.0,0.0,0.0, 0.0,0.0,0.0,0.0, 0.0,0.0,0.0,0.0,
        //             0.0,0.0,0.0,0.0, 0.0,0.0,0.0,0.0, 0.0,0.0,0.0,0.0, 0.0,0.0,0.0,0.0,
        //         ]
        let fp32:[Float] = [
            1.0,1.0,1.0,
            1.0,1.0,1.0,
            1.0,1.0,1.0,
            1.0,1.0,1.0
        ]
        let bufferSize = fp32.count * MemoryLayout<Float>.stride
        pointer = malloc(bufferSize)
        memcpy(pointer, fp32, bufferSize)
        //let fp32Ptr = pointer!.bindMemory(to: Float.self, capacity: fp32.count)
        return true
    }
    func purge() {
        pointer = nil
    }

    func label() -> String? {
        return name
    }
    func copy(with zone: NSZone? = nil) -> Any {
        fatalError("copy is not implemented")
    }
}

class Conv2d {
    var device: MTLDevice
    var outputImg: MPSImage!
    var oid = MPSImageDescriptor(channelFormat: .float16, width: 1, height: 1, featureChannels: 1)
    var conv2d: MPSCNNConvolution
    var dataSource: MPSCNNConvolutionDataSource

    init(device: MTLDevice) {
        self.device          = device
        self.outputImg       = MPSImage(device: device, imageDescriptor: oid)
        self.dataSource      = Conv2dDataSource(device, "conv2d", 2, 2, 3, 1)
        self.conv2d          = MPSCNNConvolution(device: device, weights: dataSource)
    }
    func run(input: MPSImage, commandQueue: MTLCommandQueue) -> MPSImage? {
        let outputImage = MPSImage(device: self.device, imageDescriptor: oid)
        guard let commandBuffer = commandQueue.makeCommandBuffer() else {
            return  nil
        }
        conv2d.encode(commandBuffer: commandBuffer, sourceImage: input, destinationImage: outputImage)
        commandBuffer.commit()
        return outputImage
    }
}
hollance commented 4 years ago

You set the datatype to .float16 but the weights are 32-bit floats.

xta0 commented 4 years ago

I figured out. I didn't set the mpsoffset properly. Thanks for your commenting. I'll go ahead close the issue.