jingweiz / pytorch-dnc

Neural Turing Machine (NTM) & Differentiable Neural Computer (DNC) with pytorch & visdom
MIT License
278 stars 52 forks source link

pytorch-dnc-Question #4

Closed SoroorM closed 7 years ago

SoroorM commented 7 years ago

Hi Dear Thank you for sharing code of NTM-DNC on github & thanks for the your explanations, I have a problem when trying to test the code, can you help me? the error is: NameError: name 'raw_input' is not defined how can i fixed this? image your kind attention is highly appreciated in advance. best regards

jingweiz commented 7 years ago

Hey, So a quick guess would be: are you using python3 instead of python2? Cos according to https://stackoverflow.com/questions/954834/how-do-i-use-raw-input-in-python-3 raw_input() is renamed to input() in python3. Also this repos is only tested under python2, and I never tried running it under python3 :)

SoroorM commented 7 years ago

Hi Dear Jacob

Thanks again for your guide, that error was solved, but continues of testing after one epoch, stopped and not any action, is that usual? but after close the cmd for milisecond show the error and i can't see clearly... and in visdom not showing anything... please help me to solve these problems image

best regards

jingweiz commented 7 years ago

Hey, so in testing mode, the raw_input() is used there to pause the code every step, so as to get a step by step visualization. So you can just keep pressing Enter during testing then visdom will show the visualizations after each time you pressed.

SoroorM commented 7 years ago

thank you very much...

but still haven't any visualizations in visdom :(

should I change the code? image

i open this address : "http://localhost:8097/env/MACHINE_TIMESTAMP_test" to see the test result and for train result open this : "http://localhost:8097/env/MACHINE_TIMESTAMP" but don't show anything, and I can't see anything on the basic address, too.

your kind attention is highly appreciated in advance. best regards

jingweiz commented 7 years ago

Hey, so just to make sure, you replaced the MACHINE and TIMESTAMP with the values in your code right?

SoroorM commented 7 years ago

image

image

can you tell me what's wrong of my runs? these are very different with your results.

unfortunately I still have any errors...., but first I'll try more to solve them, if I couldn't solve, ask you again, sorry... very very thanks for your helping you are the best 👍 :)

jingweiz commented 7 years ago

Super wierd, I tested multiple times and the curves look quite similar ... did u change anything in the hyperparameters? and is this ntm or dnc?

jingweiz commented 7 years ago

also, can u show me this part of the file: utils/options.py:

class Params(object):   # NOTE: shared across all modules
    def __init__(self):
        self.verbose     = 0            # 0(warning) | 1(info) | 2(debug)

        # training signature
        self.machine     = "daim"       # "machine_id"
        self.timestamp   = "17053100"   # "yymmdd##"
SoroorM commented 7 years ago

no, i didn't change anything in hyperparameters ...

sure, here you are: image

SoroorM commented 7 years ago

Hi Dear Jacob

I tried once again, and after ending training my result was little better than previous training but still different with your result.can you guide me, why? image

and also result of testing was different; i think that's very bad result, and I do not know why... :( image

Best Regards

jingweiz commented 7 years ago

Hey, sorry I'm quite busy these several days, I will try to solve this issue next week.

SoroorM commented 7 years ago

Hi okay, I'm waiting for your response and very very thank you for your kindness. :)

jingweiz commented 7 years ago

Hey, So I ran it multiple times, the training curves differ a bit but they all converge and none diverge. Are you sure that you did not change any hyperparameter settings?

SoroorM commented 7 years ago

Hi Dear Jacob first, thank you for your Follow up oh sure, I didn't change any hyperparameter settings. I was just want to experiment this code on my computer and after it, change it.

jingweiz commented 7 years ago

OK, I'm running more tests, in the meanwhile, you can try to comment off those two lines 28-29 in core/controllers/lstm_controllers.py and run it again:

        self.lstm_hidden_vb = [self.lstm_hidden_vb[0].clamp(min=-self.clip_value, max=self.clip_value),
                               self.lstm_hidden_vb[1]]
SoroorM commented 7 years ago

okay thanks I'll try it, but the results seem to be getting worse...

jingweiz commented 7 years ago

Can u tell me which commit are you in?

SoroorM commented 7 years ago

I see that address and two lines 28-29 in core/controllers/lstm_controllers.py, were not comments, so results of previous run the code, with these two lines; but I commented these two line codes, to see the what happen? when I run the 'main.py' , in "command prompt", I have this: image

and in "visdom", already have: image

SoroorM commented 7 years ago

final result of training is 👍 image

jingweiz commented 7 years ago

Hey, sorry for not answering earlier, but I'm moving these days, and hopefully will be done in the next day or so, then I will take a closer look. In the meanwhile, can you tell me which commit your are in? Cos for all the runs that I tested, I never saw anything like the ones you showed as they never diverge. Also a result of git diff would also be helpful! Thanks!

SoroorM commented 7 years ago

Hi Dear Jacob I created it.

Best Regards

jingweiz commented 7 years ago

What do you mean? Also one important thing I realized is that for the "ntm", the batch_size should be 16 instead 1, that should solve all your problems. I will update the repo after I run more tests, but you can test by changing the batch_size.

SoroorM commented 7 years ago

ok, i will change it but, what's the batch_size ?

SoroorM commented 7 years ago

thank you that problem was fixed. image

you are the best 👍 :)

what is the batch_size?
Is that length of inputs?

SoroorM commented 7 years ago

but result the test mode seems not very good... 🤔 image

jingweiz commented 7 years ago

Hey, I already tested it, and now the results are consistent. Where did you change, for me, the different runs are like this: daim_17080100 daim_17080101

So before when the batch_size is 1, I got lucky that the different runs for me always converge, then after you pointed it out, I did more runs and found some diverge, a reasonable guess would be that when the batch size is 1 then the sgd would be too stochastic. Now when I change it to 16, it is much more consistent and stable :) I will update the repo very soon today, thank you so much for the input :)

SoroorM commented 7 years ago

Thank you very much Dear Jacob 🌺

I'm just a little confused about self.seed = 1 , why changing it? can you guide me about this?

jingweiz commented 7 years ago

Hey, the seed is for the random number generator, so if the seed is fixed, you would basically get more or less the same learning curve, if an algorithm is stable, the curve should be somewhat consistent across different seeds.