pluskid / Mocha.jl

Deep Learning framework for Julia
Other
1.29k stars 254 forks source link

Guiding in training and using a MLP (using forward()) #206

Closed diegoacuna closed 8 years ago

diegoacuna commented 8 years ago

Hi, I'm traying to build and use (in production) a simple MLP using Mocha. I've checked the documentation but some steps are not entirely clear to me. This is my code:

using Mocha
using Distributions

#Synthetic Function that generates the data
f(x1, x2) = sin(x1).*sin(x2)./(x1.*x2)

function generate_dataset(media, var, tam, seed)
    srand(seed)
    return rand(MvNormal(media, var), tam)
end

dataset_input = generate_dataset([0.0;0.0], 1.0, 5000, 10)
dataset_output = f(dataset_input[1,:], dataset_input[2,:])

# now define the MLP
backend = CPUBackend()
init(backend)

data_layer = MemoryDataLayer(name="data", data=Array[dataset_input, dataset_output], batch_size=64, tops=[:data,:label])
ip_layer = InnerProductLayer(name="ip", output_dim=7, bottoms=[:data], tops=[:ip], neuron=Neurons.Sigmoid())
layer_loss = SquareLossLayer(name="loss", bottoms=[:ip, :label])

net = Net("MLP", backend, [data_layer, ip_layer, layer_loss])

method = SGD()
params = make_solver_parameters(method, max_iter=10000)
solver = Solver(method, params)

add_coffee_break(solver, TrainingSummary(), every_n_iter=100)
add_coffee_break(solver, Snapshot("snapshots"), every_n_iter=10000)
add_coffee_break(solver, test_performance, every_n_iter=100)

solve(solver, net)

Now, with the snapshot saved, I'm traying to use the MLP to do some prediction:

# I'm generating only 1 input to produce a single prediction
dataset_input = generate_dataset([0.0;0.0], 1.0, 1, 10)
dataset_output = f(dataset_input[1,:], dataset_input[2,:])

# same code of MLP definition... 
data_layer = MemoryDataLayer(name="data", data=Array[dataset_input, dataset_output], batch_size=64, tops=[:data,:label])
ip_layer = InnerProductLayer(name="ip", output_dim=7, bottoms=[:data], tops=[:ip], neuron=Neurons.Sigmoid())
layer_loss = SquareLossLayer(name="loss", bottoms=[:ip, :label])

net = Net("MLP", backend, [data_layer, ip_layer, layer_loss])

load_snapshot(net, "snapshots/snapshot-010000.jld")

print(forward(net))

Is this the correct way of obtaining the output of the network? I'm asking because I'm getting bad predictions, this could be product of bad training but I want to know if maybe I'm coding something wrong. Thanks in advance.

diegoacuna commented 8 years ago

I was able to train and predict in a regression task with a simple MLP. This is my code:

using Mocha
using Distributions

# generate inputs
generate_dataset(media,var,tam) = rand(MvNormal(media, var), tam)
# generate outputs
f1(x1, x2) = sin(x1).*sin(x2)./(x1.*x2)

#Parameter Definition for the dataset generation
media_x1=0.0
media_x2=0.0
mean=[media_x1;media_x2]
var_x1=1.0
var_x2=1.0
var=[var_x1 0.0;0.0 var_x2]
tam=5000

srand(10)

datasetinput = generate_dataset(mean, var, tam)

datasetoutput = f1(datasetinput[1,:], datasetinput[2,:])

backend = CPUBackend()
init(backend)

data_layer = MemoryDataLayer(name="data", data=Array[datasetinput, datasetoutput], batch_size=64)
ip_layer = InnerProductLayer(name="ip", output_dim=10, bottoms=[:data], tops=[:ip], neuron=Neurons.Identity())
aggregator = InnerProductLayer(name="aggregator", output_dim=1, tops=[:aggregator], bottoms=[:ip] )
layer_loss = SquareLossLayer(name="loss", bottoms=[:aggregator, :label])

common_layers = [ip_layer, aggregator]

net = Net("MLP", backend, [data_layer, common_layers, layer_loss])

method = SGD() # stochastic gradient descent
params = make_solver_parameters(method, max_iter=100)
solver = Solver(method, params)

# report training progress every 100 iterations
add_coffee_break(solver, TrainingSummary(), every_n_iter=10)
add_coffee_break(solver, Snapshot("snapshots"), every_n_iter=100)

solve(solver, net)

Mocha.dump_statistics(solver.coffee_lounge, get_layer_state(net, "loss"), true)

destroy(net)
destroy(net_test)
shutdown(backend)

And to make predictions:

using Mocha
using Distributions

srand(500)

# generate inputs
generate_dataset(media,var,tam) = rand(MvNormal(media, var), tam)
# generate outputs
f1(x1, x2) = sin(x1).*sin(x2)./(x1.*x2)

#Parameter Definition for the dataset generation
media_x1=0.0
media_x2=0.0
mean=[media_x1;media_x2]
var_x1=1.0
var_x2=1.0
var=[var_x1 0.0;0.0 var_x2]
tam=10

datasetinput=generate_dataset(mean, var, tam)
datasetoutput=f1(datasetinput[1,:], datasetinput[2,:])

println(datasetinput)
println(datasetoutput)

backend = CPUBackend()
init(backend)

data_layer = MemoryDataLayer(name="data", data=Array[datasetinput, datasetoutput], batch_size=10, tops=[:data,:label])
ip_layer = InnerProductLayer(name="ip", output_dim=10, bottoms=[:data], tops=[:ip], neuron=Neurons.Identity())
aggregator = InnerProductLayer(name="aggregator", output_dim=1, tops=[:aggregator], bottoms=[:ip] )

common_layers = [ip_layer, aggregator]

net = Net("MLP", backend, [data_layer, common_layers])

load_snapshot(net, "snapshots/snapshot-000100.jld")

forward(net)
println(net.output_blobs[:aggregator].data)

destroy(net)

shutdown(backend)

I think this is the correct approach.

pluskid commented 8 years ago

Thanks for posting this example. You can also use MemoryOutputLayer or HDF5OutputLayer to automatically collect the predictions into one place for the whole dataset.

diegoacuna commented 8 years ago

Thanks for your valuable comment. I create a small blog post explaining how to use Mocha to train a MLP for regression (http://www.diegoacuna.me/multi-layer-perceptron-for-regression-in-julia-using-the-mocha-framework/). I'm going to close this issue.

Thanks for your help!