lmjohns3 / theanets

Neural network toolkit for Python
http://theanets.rtfd.org
MIT License
328 stars 74 forks source link

CPU/GPU saving/conversion -- what actually needs to be serialized? #93

Closed abramhindle closed 8 years ago

abramhindle commented 8 years ago

Hi, with Theano it is pretty clear that they try but can't always succeed when trying to unpickle a GPU pickle on a CPU only/CUDA-less system.

I'm worried, since the results seem slightly different when I test them that the following algorithm is losing information that perhaps I overlooked. What I do is dump out all the params of the network and save that, then reload that into a network of the same configuration.

Thus I ask the gracious developers: What am I missing if I just save parameters? Is there some state that I am overlooking?

Below is a network that I configure for an autocoder, I use it to encode, but it seems once I do this paramlist conversion I get slightly different results. So either I'm using doubles on the CPU accidentally or the network isn't be restored well enough. make_network builds my network, to_param_list makes a parameter list that you can pickle safer w/o that CUDA/CPU mess and set_network takes a param_list and sets it back to a network of the same configuration.

Summary: pickles aren't portable, I am trying to save out state to rebuild the network, am I missing any state I should know about?

# make a network
def make_network(bits):
    hiddensize = 1024
    outputsize = bits
    inputs = 4129
    net = theanets.Autoencoder([inputs, hiddensize, hiddensize, (outputsize,'rect:minmax'), (hiddensize,'tied'), (hiddensize,'tied'), (inputs,'tied')])
    return net

# convert the network to lists of tuples storing parameters
def to_param_list(network):
    newlayers = list()
    for layer in network.layers:
        params = list()
        for param in layer.params:
            name = str(param)
            values = param.get_value()
            params.append((name,values))
        newlayers.append(params)
    return newlayers

# take a network of the same configuration and load it up
def set_network(network, param_list):
    for layeri in range(0,len(network.layers)):
        print layeri
        player = param_list[layeri]
        layer  = network.layers[layeri]
        for parami in range(0,len(layer.params)):
            pparam = player[parami]
            param  = layer.params[parami]
            if not str(param) == pparam[0]:
                raise Exception(" %s != %s ", str(param), pparam[0])
            print pparam[0]
            param.set_value(pparam[1])

net = make_network(128)
param_list = to_param_list(net)
net2 = make_network(128)
set_network(net2,param_list)

You're free to incorporate the to_param_list and set_network code if you want, but I'm sure it is missing things.

majidaldo commented 8 years ago

i assume you have floatX=float32 in your .theanorc?

abramhindle commented 8 years ago

@majidaldo I do on the GPU systems. I guess I should try again with it enabled to make sure it is consistent.

I'm more so worried about missing parameters when saving out, because this param-list trick totally seems to work across "platforms".

lmjohns3 commented 8 years ago

The param list "trick" will work as long as the network you're saving the parameters from and the network you're loading the parameters into have the same architecture. (However, it's a good idea to make sure the floatX config is the same on both platforms, like @majidaldo suggests.) The code you included above is actually pretty close to what I'd do if there was a non-pickling way to load and save a model.