rasmusbergpalm / DeepLearnToolbox

Matlab/Octave toolbox for deep learning. Includes Deep Belief Nets, Stacked Autoencoders, Convolutional Neural Nets, Convolutional Autoencoders and vanilla Neural Nets. Each method has examples to get you started.
BSD 2-Clause "Simplified" License
3.79k stars 2.28k forks source link

Running a DBN #16

Closed summerstay closed 11 years ago

summerstay commented 11 years ago

The example shows how to train a DBN and visualize its weights, but doesn't give an example of how to apply the DBN to new data. Suppose I have trained the DBN on the mnist training set. Now I want to present a new handwritten digit and see what the response of each of the hidden units is to that image. How would I go about that? Thanks!

summerstay commented 11 years ago

Perhaps I should unfold the DBN to a NN and use nnff?

rasmusbergpalm commented 11 years ago

You figured it out :+1:

beamandrew commented 11 years ago

Could you give me a little guidance with this? So far I've trained the RBM layers and unrolled to a NN, as you did in the demo. For a two-class classification problem, I'm trying to make predictions on a withheld test set, but I don't think I'm doing it correctly:

nn = nntrain(nn, Xtrain(traini,:), Y(traini,:), opts); nn.testing = 1; nntest = nnff(nn,Xtrain(testi,:), Y(testi,:)); predictions = round(nntest.a{end});

I'd like to predict for sets where I don't have the label information, but I'm not sure how to do it. If I just drop in a vector of all 1s or 0s as a placeholder in the code above, I get awful accuracy results, making me believe I'm not doing it correctly.

Any pointers?

Thanks

rasmusbergpalm commented 11 years ago

It sounds like you want to do two things: test your performance on a withheld test set. For that you should use

[er, bad] = nntest(nn, test_x, test_y);

make predictions on unseen data (i.e. actually use the classifier) where you should use

nn.testing = 1;
nntest = nnff(nn, unseen_x, whatever_y);
[~,label] = max(nntest.a{end},2);

y can be whatever as it's not used for the predictions (it needs to be the size that the train_y was though). I should create a nnpredict method that wraps this nicer. If you're awesome you'll create that method and send a pull request.

beamandrew commented 11 years ago

Thanks for the quick reply and thanks again for this toolbox - it's allowed me to get my hands dirty with DBN.

I don't understand the use of the max operator here, and when I run it as you suggested I get an error. If nn.a{end} contains the top node's activation level for each observation after the feedforward pass, why am I taking the max of this component? When I run your suggestion I get this, which I'm sure is a simple Matlab error, but Matlab is not my native language so I'm not exactly sure what's going on.

[~,label] = max(nntest.a{end},2); Error using max MAX with two matrices to compare and two output arguments is not supported.

So if I replace that with

label = max(nntest.a{end},2);

it works but my labels are all 2. If I change it to max(nntest.a{end},1), they are all 1, which is in line with my understanding of max().

rasmusbergpalm commented 11 years ago

Ah. Sorry. Try with

[~,label] = max(nntest.a{end},[],2);

What you are doing is you are taking the max output at the top level for all your input. So if at the top you have two outputs nn.a{end}(1) and nn.a{end}(2) the max operation will find the one that is most activated by your input, i.e. the label that the classifier has predicted.

On Mon, Mar 4, 2013 at 4:31 PM, beamandrew notifications@github.com wrote:

Thanks for the quick reply and thanks again for this toolbox - it's allowed me to get my hands dirty with DBN.

I don't understand the use of the max operator here, and when I run it as you suggested I get an error. If nn.a{end} contains the top node's activation level for each observation after the feedforward pass, why am I taking the max of this component? When I run your suggestion I get this, which I sure is a simple Matlab error, but Matlab is not my native language so I'm not exactly sure what's going on.

[~,label] = max(nntest.a{end},2); Error using max MAX with two matrices to compare and two output arguments is not supported.

So if I replace that with

label = max(nntest.a{end},2);

it works but my labels are all 2. If I change it to max(nntest.a{end},1), they are all 1, which is in line with my understanding of max().

— Reply to this email directly or view it on GitHubhttps://github.com/rasmusbergpalm/DeepLearnToolbox/issues/16#issuecomment-14386219 .

beamandrew commented 11 years ago

Oh ok, I see and I think I've found my issue. I'm doing two class-classification, which now I'm guessing my response matrix should be a Nx2 column of observations, right? I've been using 1 column, which means I've only been using 1 unit in the output layer, when I actually need two. Thanks again.

rasmusbergpalm commented 11 years ago

You are welcome.

On Mon, Mar 4, 2013 at 5:13 PM, beamandrew notifications@github.com wrote:

Oh ok, I see and I think I've found my issue. I'm doing two class-classification, which now I'm guessing my response matrix should be a Nx2 column of observations, right? I've been using 1 column, which means I've only been using 1 unit in the output layer, when I actually need two. Thanks again.

— Reply to this email directly or view it on GitHubhttps://github.com/rasmusbergpalm/DeepLearnToolbox/issues/16#issuecomment-14388672 .

beamandrew commented 11 years ago

I must be dense, but I can't seem to get this to work, even on the training data used to build the NN. To simplify, I've switch from a DBN to NN for the time being. Here is my setup

nn = nnsetup([Nfea 100 2]); %Nfea is number of columns in train_x opts.numepochs = 10; % Number of full sweeps through data opts.batchsize = 100;
nn.normalize_input = 1; nn.dropout = 0.5; nn.activation_function = 'sigm';

%train nn twocol = [~train_y train_y]; nn = nntrain(nn, train_x,twocol, opts);

nn.testing = 1; nntest = nnff(nn,train_x, twocol); %train_x is 24000 by 1025 nn.testing = 0; a = nntest.a{end}; a(1:10,:) % I can't even overfit to the training data at this point.

The first ten values of the top layer is: ans =

1.0000    0.0000
0.9998    0.0003
1.0000    0.0000
1.0000    0.0000
1.0000    0.0000
1.0000    0.0000
0.9999    0.0000
1.0000    0.0000
0.9985    0.0011
1.0000    0.0000

The true values are:

twocol(1:10,:)

ans =

 1     0
 1     0
 1     0
 0     1
 0     1
 1     0
 0     1
 1     0
 0     1
 1     0

No matter how I configure the NN, the story is always the same, I never get any activation for the unit representing the second label. As a sanity check, I dropped the data I'm using into some of the tree-based methods in R and I get decent classification results. Any idea what's going on?

rasmusbergpalm commented 11 years ago

Ah. Your data is not being normalized when you do the last nnff. If you use nntest it should be. Also make sure the loss function is dropping while training or make the learning rate smaller.

On 04/03/2013, at 18.07, beamandrew notifications@github.com wrote:

I must be dense, but I can't seem to get this to work, even on the training data used to build the NN. To simplify, I've switch from a DBN to NN for the time being. Here is my setup

´´´matlab nn = nnsetup([Nfea 100 2]); %Nfea is number of columns in train_x opts.numepochs = 10; % Number of full sweeps through data opts.batchsize = 100;

nn.normalize_input = 1; nn.dropout = 0.5; nn.activation_function = 'sigm';

%train nn twocol = [~train_y train_y]; nn = nntrain(nn, train_x,twocol, opts);

nn.testing = 1; nntest = nnff(nn,train_x, twocol); %train_x is 24000 by 1025 nn.testing = 0; a = nntest.a{end}; a(1:10,:) % I can't even overfit to the training data at this point.

´´´´ The first ten values of the top layer is: ans =

1.0000 0.0000 0.9998 0.0003 1.0000 0.0000 1.0000 0.0000 1.0000 0.0000 1.0000 0.0000 0.9999 0.0000 1.0000 0.0000 0.9985 0.0011 1.0000 0.0000

The true values are:

twocol(1:10,:)

ans =

1 0 1 0 1 0 0 1 0 1 1 0 0 1 1 0 0 1 1 0

No matter how I configure the NN, the story is always the same, I never get any activation for the unit representing the second label. As a sanity check, I dropped the data I'm using into some of the tree-based methods and R and I get decent classification results. Any idea what's going on?

— Reply to this email directly or view it on GitHubhttps://github.com/rasmusbergpalm/DeepLearnToolbox/issues/16#issuecomment-14391848 .

beamandrew commented 11 years ago

Thank a lot of all the help, it appears to be working now. I have a theoretical question for you as well if you don't mind. My understanding is that generative pre-training (i.e. the RBM layers of a DBN) were used before the paper on dropout was published, but now training deep neural nets is favored using dropout instead of using the DBNs. Do you think this is the case, or are DBNs still used for some applications?

rasmusbergpalm commented 11 years ago

Glad you made it work. By the way check out 0fb07625655e15e0d22e6ddbaeb1f89bbedcac84. I added nnpredict to make stuff easier. It also does normalization 'the right way' now.

As for your theoretical question: It seems that nowadays, if you have enough labeled data you can just use dropout+maxout to get state of the art. If you want to beat state of the art I think maxout+marginalized dropout will be the winner in the coming months.

However, if you don't have lots and lots of labeled data, pre-training seems to be more important.

By the way, what are you using the toolbox for? I'm curious :) If you don't feel like disclosing it in public you can send me a message on linkedin.

beamandrew commented 11 years ago

Thanks, that was my understanding as well. I was looking through the toolbox for maxout, but didn't see it. It's pretty new so I wasn't surprised.

I'm using it for a contest on Kaggle to identify whale calls:

http://www.kaggle.com/c/whale-detection-challenge

I'm not very concerned about placing high, but instead I'm using it as an opportunity to apply a deep learning architecture. I'm a grad student myself and part of my research involves machine learning (both applications and method development).

While we've got this conversation going, I've read the dropout paper on arxiv and watched Hinton's NIPS 2012 talk. He says that dropout is equivalent to the geometric mean of all possible models, but I haven't seem a formal proof of this anywhere. Is there a paper floating around some where with this proof?

Thanks again, Andrew

rasmusbergpalm commented 11 years ago

I think hinton proves it in the dropout paper for a single layer model, and then does a little slight of hand; expands the model to many layers and shows that it works nicely in practice even though un-prooven. The maxout paper is the most thorough on dropout I've read so far. I think it expands hintons proofs to multiple layers under certain circumstances if i remember correctly.

Cheers, Rasmus.

On Mon, Mar 4, 2013 at 7:32 PM, beamandrew notifications@github.com wrote:

Thanks, that was my understanding as well. I was looking through the toolbox for maxout, but didn't see it. It's pretty new so I wasn't surprised.

I'm using it for a contest on Kaggle to identify whale calls:

http://www.kaggle.com/c/whale-detection-challenge

I'm not very concerned about placing high, but instead I'm using it as an opportunity to apply a deep learning architecture. I'm a grad student myself and part of my research involves machine learning (both applications and method development).

While we've got this conversation going, I've read the dropout paper on arxiv and watched Hinton's NIPS 2012 talk. He says that dropout is equivalent to the geometric mean of all possible models, but I haven't seem a formal proof of this anywhere. Is there a paper floating around some where with this proof?

Thanks again, Andrew

— Reply to this email directly or view it on GitHubhttps://github.com/rasmusbergpalm/DeepLearnToolbox/issues/16#issuecomment-14396462 .

summerstay commented 11 years ago

I don't have the statistics toolbox, so I wrote my own zscore function. Maybe you want to include it?

function [x, mu, sigma] = zscore(x) sigma=std(x); mu=mean(x); repmatmu=repmat(mu,[size(x,1),1]); x=x-repmatmu; repmatsigma=repmat(sigma,[size(x,1),1]); x=x./repmatsigma; end

mohsenali commented 11 years ago

can bsxfun be used instead of the repmat?

On Thu, Mar 7, 2013 at 10:13 AM, summerstay notifications@github.comwrote:

I don't have the statistics toolbox, so I wrote my own zscore function. Maybe you want to include it?

function [x, mu, sigma] = zscore(x) sigma=std(x); mu=mean(x); repmatmu=repmat(mu,[size(x,1),1]); x=x-repmatmu; repmatsigma=repmat(sigma,[size(x,1),1]); x=x./repmatsigma; end

— Reply to this email directly or view it on GitHubhttps://github.com/rasmusbergpalm/DeepLearnToolbox/issues/16#issuecomment-14565640 .

summerstay commented 11 years ago

Yeah, it looks like bsxfun would be better.

gallamine commented 11 years ago

Make sure you error check that if std == 0 set it to 'eps' — Sent from Mailbox for iPhone

On Thu, Mar 7, 2013 at 11:16 AM, summerstay notifications@github.com wrote:

Yeah, it looks like bsxfun would be better.

Reply to this email directly or view it on GitHub: https://github.com/rasmusbergpalm/DeepLearnToolbox/issues/16#issuecomment-14569843

rasmusbergpalm commented 11 years ago

@gallamine or @summerstay Whoever sends a pullrequest first with a zscore using bsxfun get's it in the toolbox!

Noumansoomro commented 10 years ago

Hi! Iam new in deep learning. I want to extract features for each image separately from all algorithms of DeepLearnToolbox, can you guide me how to extract features of each image separately From DBN, CAE, CNN,SAE