pygfx / pyshader

Write modern GPU shaders in Python!
BSD 2-Clause "Simplified" License
72 stars 1 forks source link

Document math / built-in functions #61

Open axsaucedo opened 3 years ago

axsaucedo commented 3 years ago

Edit: Given this has been fixed, and works correctly with the stdlib functions, the issue can be focused mainly on documenting these functions as they are currently not documented.

I am currently trying to implement the machine learning logistic regression algorithm below (as explained in this blog post). Most of the shader can be restructured to some of the limitations (such as removing the functions), but it seems that the current blocker is that the log(...) function is not supported. It would be quite useful if built-in shader functions could be used in the pyshader function.

The GLSL shader being implemented is below:

#version 450

layout (constant_id = 0) const uint M = 0;

layout (local_size_x = 1) in;

layout(set = 0, binding = 0) buffer bxi { float xi[]; };
layout(set = 0, binding = 1) buffer bxj { float xj[]; };
layout(set = 0, binding = 2) buffer by { float y[]; };
layout(set = 0, binding = 3) buffer bwin { float win[]; };
layout(set = 0, binding = 4) buffer bwouti { float wouti[]; };
layout(set = 0, binding = 5) buffer bwoutj { float woutj[]; };
layout(set = 0, binding = 6) buffer bbin { float bin[]; };
layout(set = 0, binding = 7) buffer bbout { float bout[]; };
layout(set = 0, binding = 8) buffer blout { float lout[]; };

float m = float(M);

float sigmoid(float z) {
    return 1.0 / (1.0 + exp(-z));
}

float inference(vec2 x, vec2 w, float b) {
    // Compute the linear mapping function
    float z = dot(w, x) + b;
    // Calculate the y-hat with sigmoid
    float yHat = sigmoid(z);
    return yHat;
}

float calculateLoss(float yHat, float y) {
    return -(y * log(yHat)  +  (1.0 - y) * log(1.0 - yHat));
}

void main() {
    uint idx = gl_GlobalInvocationID.x;

    vec2 wCurr = vec2(win[0], win[1]);
    float bCurr = bin[0];

    vec2 xCurr = vec2(xi[idx], xj[idx]);
    float yCurr = y[idx];

    float yHat = inference(xCurr, wCurr, bCurr);

    float dZ = yHat - yCurr;
    vec2 dW = (1. / m) * xCurr * dZ;
    float dB = (1. / m) * dZ;
    wouti[idx] = dW.x;
    woutj[idx] = dW.y;
    bout[idx] = dB;

    lout[idx] = calculateLoss(yHat, yCurr);
}
axsaucedo commented 3 years ago

It seems there is a stdlib folder with some core functions, which have allowed me to get the spirv to compile. I still seem to have strange behaviour on the model as the outputs don't align with the C++ implementation, and given it's deterministic, the outputs should be the same.

However it does look quite positive, but would be good to get further clarity if the function extensions are in usable state, and whether there are best practices. Also whether there are any ideas why the behaviour is not the same as the shader provided above.

from kp import Tensor, Manager, Sequence, log_level

from pyshader import python2shader, f32, ivec3, Array, vec2
from pyshader.stdlib import exp, log

@python2shader
def compute_shader(
        index   = ("input", "GlobalInvocationId", ivec3),
        x_i     = ("buffer", 0, Array(f32)),
        x_j     = ("buffer", 1, Array(f32)),
        y       = ("buffer", 2, Array(f32)),
        w_in    = ("buffer", 3, Array(f32)),
        w_out_i = ("buffer", 4, Array(f32)),
        w_out_j = ("buffer", 5, Array(f32)),
        b_in    = ("buffer", 6, Array(f32)),
        b_out   = ("buffer", 7, Array(f32)),
        l_out   = ("buffer", 8, Array(f32)),
        M       = ("buffer", 9, Array(f32))):

    i = index.x

    m = M[0]

    w_curr = vec2(w_in[0], w_in[1])
    b_curr = b_in[0]

    x_curr = vec2(x_i[i], x_j[i])
    y_curr = y[i]

    z_dot = (w_curr.x * x_curr.x) + (w_curr.y + x_curr.y)
    z = z_dot + b_curr
    y_hat = 1.0 / (1.0 + exp(-z))

    d_z = y_hat - y_curr
    d_w = (1.0 / m) * x_curr * d_z
    d_b = (1.0 / m) * d_z

    loss = -(y_curr * log(y_hat) + ((1.0 + y_curr) * log(1.0 - y_hat)))

    w_out_i[i] = d_w.x
    w_out_j[i] = d_w.y
    b_out[i] = d_b
    l_out[i] = loss

# First we create input and ouput tensors for shader
tensor_x_i = Tensor([0.0, 1.0, 1.0, 1.0, 1.0])
tensor_x_j = Tensor([0.0, 0.0, 0.0, 1.0, 1.0])

tensor_y = Tensor([0.0, 0.0, 0.0, 1.0, 1.0])

tensor_w_in = Tensor([0.001, 0.001])
tensor_w_out_i = Tensor([0.0, 0.0, 0.0, 0.0, 0.0])
tensor_w_out_j = Tensor([0.0, 0.0, 0.0, 0.0, 0.0])

tensor_b_in = Tensor([0.0])
tensor_b_out = Tensor([0.0, 0.0, 0.0, 0.0, 0.0])

tensor_l_out = Tensor([0.0, 0.0, 0.0, 0.0, 0.0])

tensor_m = Tensor([ 5.0 ])

# We store them in an array for easier interaction
params = [tensor_x_i, tensor_x_j, tensor_y, tensor_w_in, tensor_w_out_i,
    tensor_w_out_j, tensor_b_in, tensor_b_out, tensor_l_out, tensor_m]

mgr = Manager()

# Initalise GPU buffer and memory
mgr.eval_tensor_create_def(params)

ITERATIONS = 100
learning_rate = 0.1

# Perform machine learning training and inference across all input X and Y
for i_iter in range(ITERATIONS):
    mgr.eval_tensor_sync_device_def([tensor_w_in, tensor_b_in])
    mgr.eval_algo_data_def(params, compute_shader.to_spirv())
    mgr.eval_tensor_sync_local_def([tensor_w_out_i, tensor_w_out_j, tensor_b_out, tensor_l_out])

    # Calculate the parameters based on the respective derivatives calculated
    w_in_i_val = tensor_w_in.data()[0]
    w_in_j_val = tensor_w_in.data()[1]
    b_in_val = tensor_b_in.data()[0]

    for j_iter in range(tensor_b_out.size()):
        w_in_i_val -= learning_rate * tensor_w_out_i.data()[j_iter]
        w_in_j_val -= learning_rate * tensor_w_out_j.data()[j_iter]
        b_in_val -= learning_rate * tensor_b_out.data()[j_iter]

    # Update the parameters to process inference again
    tensor_w_in.set_data([w_in_i_val, w_in_j_val])
    tensor_b_in.set_data([b_in_val])

# Print learned parameters of logistic regression algorithm
print(tensor_w_in.data())
print(tensor_b_in.data())
print(tensor_l_out.data())

In this implementation all functions were removed, and were added in-line. The specialisation / constant was changed into a simple input parameter.

axsaucedo commented 3 years ago

For completeness, I can confirm that if I run the same glsl shader outlined above with the Python script, I get the same results as with the C++.

To run the script above the only changes required on the GLSL script is to add the extra binding to account for the variable M that otherwise is provided as specialization constant - ie:

//...
layout(set = 0, binding = 9) buffer bMb { float Mb[]; };

float m = Mb[0];
//...

The spirv contents of this shader result in the same output as per the C++, so it seems there is either something wrong with the way I implemented the shader, or perhaps the instructions for exp, log or other components may not be correct.

axsaucedo commented 3 years ago

Ok I have found the issue, it seems that there is a function provided to calculate the dot product, which is mainly using the @ operator. The manual implementation of the dot product in z_dot = (w_curr.x * x_curr.x) + (w_curr.y + x_curr.y) was incorrect, and re-implementing it as z_dot = w_curr @ x_curr fixed the issue.

Now both the C++ and Python implementation with PyShader have the same output for the Machine Learning Logistic Regression implementation.

axsaucedo commented 3 years ago

Changed the focus of this issue towards documenting the math / standard lib functions, as it seems from this that it worked as required.

almarklein commented 3 years ago

Nice to see your progressions like this ;)

Good point about the documentation. I think we finally need some proper docs on rtd.