songjhaha / PaddleChainRules.jl

1 stars 0 forks source link

Improvement suggestion #6

Open songjhaha opened 2 years ago

songjhaha commented 2 years ago

some problems in the package

songjhaha commented 2 years ago
  1. why do we need to separate params and model? If we just need to get the grad, we could just write the vjp function like this:
    
    struct PaddleModuleWrapper
    NN::PyObject
    end

function vjp_wrt_params_and_args(nn::PyObject, pyargs...; kwargs...) res = nn(pyargs...; kwargs...) pyparams = nn.parameters() paramslen = length(pyparams) function vjp_func(Δ) grad = paddle.fluid.dygraph.grad([res], [pyparams..., pyargs...], Δ, retain_graph=true) return (Tuple(grad[1:paramslen]),grad[paramslen+1:end]...) end return res, vjp_func end

which would give us the correct result. But things would be difficult when we trying to update the params with optimizers and in some cases, we need to separate params and model like examples in neuralPDE:
```julia
paddlewrap = PaddleFCNet(2, 1, 3, 16;dtype="float64", activation="sigmoid")
initθ, _ = Optimisers.destructure(paddlewrap)
discretization = PhysicsInformedNN(paddlewrap, StochasticTraining(100;bcs_points = 40), init_params =initθ)

It should be fine if we use DLPack.jl which python's tensor and julia's array will share some data, and any in-place changes in arrays will change the tensor too.

But it looks like the NeuralPDE will not directly change the params in the model, but change a new flatten array generated during training.

So the solution now is that every time before calling model(x), we will convert params to python's tensor which may create more cost.

In the futures, we could look more into Optimisers.jl and NeuralPDE.jl to create new function when updating the params of wrapper.

songjhaha commented 2 years ago
  1. If we separate params and model, we have to deal with general net. In full connect network case, we could just define the forward function for linear layer and activations, like:
    
    function (stateless_module::PaddleStatelessFCNet)(params::Vector, inputs; kwinputs...)
    out = PyNULL()
    copy!(out, inputs)
    state = 1
    for layer in stateless_module.layers
        state = PaddleLayerForward!(out, params, state, layer)
    end
    return out
    end

wrap paddle's layer

struct PaddleLinear<:PaddleStatelessLayer features_ins::Int features_outs::Int end

function PaddleLayerForward!(out::PyObject, params::Vector, state::Int, L::PaddleLinear) weight, state = iterate(params, state) bias, state = iterate(params, state) copy!(out, paddle.matmul(out, weight)) copy!(out, paddle.add(out, bias)) return state end

struct PaddleActivation<:PaddleStatelessLayer act::PyObject end

function PaddleLayerForward!(out::PyObject, params::Vector, state::Int, L::PaddleActivation) copy!(out, L.act(out)) return state end


But if we are going to build a more complex net, we need to write every forward function by hand. 

So the solution now would just creata another type for general net, and copy the params to python's model before calling forward function
```julia
# a rough solution for General Net
struct PaddleStatelessGeneralNet<:PaddleStatelessModule
    NN::PyObject
end

function (stateless_module::PaddleStatelessGeneralNet)(params::Vector, inputs; kwinputs...)
    map((p,p_new)->p.set_value(p_new), stateless_module.NN.parameters(), params)
    out = stateless_module.NN(inputs)
    return out
end
songjhaha commented 2 years ago
  1. In the NeuralPDE case, if using the training strategy QuadratureTraining(), the training will become pretty slow. I'm still checking the reasons. QuadratureTraining() uses a adaptive quadrature method which involves some integral algorithm, maybe it's using a bigger samples set. I'm not sure