pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.72k stars 2.02k forks source link

Using PyMC3 for deep learning(Regression) #2194

Closed gbengaoti closed 7 years ago

gbengaoti commented 7 years ago

I am using PyMC3 on a regression task on a Vanilla Recurrent Neural network. The problem I am trying to solve is the Addition problem; you have two sequences of length N; the first sequence is a sequence of N numbers between 0 and 1, and the second sequence is a sequence of 0s except in two positions where we have 1s. We need to predict the sum at the two locations in the first sequence when the second sequence is 1. I have not seen an example like this in PyMC3 so I am not very sure of my solution, also, the accuracy is not very good. I am looking for suggestions to make the model better or to verify its correctness. Also, the output of the regression task(likelihood) ranges from 0 to 2, should I be using a Uniform distribution instead of a Normal distribution for the output? Here is my code:

`import timeit
start = timeit.default_timer()
import theano
import theano.tensor as T
import numpy as np
import pymc3 as pm
import lasagne
from sklearn.metrics import r2_score

chunk_size = 2
n_chunks = 8
num_classes = 1

# begin by generating dataset so we have an array of lists
def data_generator(N, seq_len=8, high=1):
    X_num = np.random.uniform(low=0, high=high, size=(N, seq_len, 1))
    X_mask = np.zeros((N, seq_len, 1))
    Y = np.ones((N, 1))
    for i in range(N):
        # Default uniform distribution on position sampling
        positions = np.random.choice(seq_len, size=2, replace=False)
        X_mask[i, positions] = 1
        Y[i, 0] = np.sum(X_num[i, positions])
    X = np.append(X_num, X_mask, axis=2)
    return X, Y

NUM_EXAMPLES = 2000
test_input, test_output = data_generator(500)

train_input, train_output = data_generator(NUM_EXAMPLES)
print ("Data done generating ....")
input_var = theano.shared(np.asarray(train_input).astype(np.float64))
target_var = theano.shared(np.asarray(train_output).astype(np.float64))

N_HIDDEN = 24
GRAD_CLIP = 100

def build_rnn(init):
    print("Building network ...")

    l_in = lasagne.layers.InputLayer(shape=(None,n_chunks, chunk_size),
        input_var= input_var)

    l_forward_1 = lasagne.layers.RecurrentLayer(
            l_in, N_HIDDEN, grad_clipping=GRAD_CLIP,
            nonlinearity=lasagne.nonlinearities.tanh,only_return_final=True)

    l_shp = lasagne.layers.ReshapeLayer(l_forward_1, (-1, N_HIDDEN))

    l_dense = lasagne.layers.DenseLayer(l_shp, num_units=1,
        W = init,
        nonlinearity=lasagne.nonlinearities.linear)

    l_out = lasagne.layers.ReshapeLayer(l_dense,(-1,num_classes))

    print("Finished building layers...")
    # Not sure of how to set sd?
    eps = pm.Uniform('eps', lower=0, upper=2)

    prediction = lasagne.layers.get_output(l_out)

    # #target

    out = pm.Normal('out',mu=prediction, sd=eps, observed=target_var)

    return out

class GaussWeights(object):
    def __init__(self):
        self.count = 0
    def __call__(self, shape):
        self.count += 1
        print (shape)
        return pm.Normal('w%d' % self.count, mu=0, sd=1,
                         testval=np.random.normal(size=shape).astype(np.float64),
                         shape=shape)

with pm.Model() as rnn:
    out = build_rnn(GaussWeights())

from six.moves import zip

input_var.set_value(np.asarray(train_input).astype(np.float64))
target_var.set_value(np.asarray(train_output).astype(np.float64))

# Tensors and RV that will be using mini-batches
minibatch_tensors = [input_var, target_var]
minibatch_RVs = [out]

# Generator that returns mini-batches in each iteration
def create_minibatch(data):
    rng = np.random.RandomState(0)

    while True:
        # Return random data samples of set size 100 each iteration
        ixs = rng.randint(len(data), size=50)
        yield data[ixs]

minibatches = zip(
    create_minibatch(np.asarray(train_input)), 
    create_minibatch(np.asarray(train_output)),
)

total_size = len(train_input)

with rnn:
    print ("Optimization starts here")
    # Run ADVI which returns posterior means, standard deviations, and the evidence lower bound (ELBO)
    v_params = pm.variational.advi_minibatch(
        n=50000, minibatch_tensors=minibatch_tensors, 
        minibatch_RVs=minibatch_RVs, minibatches=minibatches, 
        total_size=total_size, learning_rate=0.1, epsilon=1.0)

    trace = pm.variational.sample_vp(v_params, draws=5000)

# should test on test set here
input_var.set_value(np.asarray(test_input).astype(np.float64))
target_var.set_value(np.asarray(test_output).astype(np.float64))

ppc = pm.sample_ppc(trace, model=rnn, samples=500)

# Use probability of > 0.5 to assume prediction of class 1
pred = ppc['out'].mean(axis=0)

#getting accuracy
print('Accuracy = {}%'.format(r2_score(test_output, pred)))
stop = timeit.default_timer()

print ('It took', stop-start, 'secs')`
ferrine commented 7 years ago

Also, the output of the regression task(likelihood) ranges from 0 to 2, should I be using a Uniform distribution instead of a Normal distribution for the output? Here is my code:

You can try rescaled Beta distribution. And regress log alpha and log beta

Johnnyboycurtis commented 7 years ago

Can you briefly explain the data model for this problem?

gbengaoti commented 7 years ago

sorry, what do you mean by data model?

fonnesbeck commented 7 years ago

This is a general usage question, rather than a bug report so I'm going to close it. @gbengaoti I suggest moving this to our Discourse page.