Open simNN7 opened 10 years ago
Hello gents. Thank you very much for your interest in the toolbox. There are two cases in which errors occur:
During the next days I will take a look at making the training easier to understand. I can see why this confuse you a lot. But the way that it has been setup now leaves many ways of generating the dataset, which is very handy when doing scientific analysis of different DBNs. Let me know if this helps you getting started using the toolbox?
Best regards Lars
Hi Lars,
Having the same issue as Vamsi-lg, could you help? Thanks!
Hello Karenkua and Vamsi-Ig. The attribute list (saved as serialised file attribute.p) must be generated in the data preparation by the def "__set_attributes" as a part of the generation of the training set. This is the list of words for the BOW. Please let me know how that works.
Hi Lars, thanks for the prompt reply, greatly appreciated. I tried the following steps and different problems arise.
for doc in docs:
I assume that has something to do with the .p files in step 2, could you kindly advise if the input files have to be in .p format (any code for converting or source for downloading the files). I got mine from http://qwone.com/~jason/20Newsgroups/ as mentioned in README file. Or how could I go around this problem. Thanks again!
Hi again. So now I have made various amendments to the toolbox so that it should be much clearer what needs to be done. Please read the README.md file and follow the 3 examples. That should do the job to get you up-and-running on using the toolbox.
Hi Lars,
Thanks for your toolbox. I am trying to run your code, but it has the exception, which says "deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x))".
I googled solutions to fix it such as replacing the sigmoid function in dbn.py:241 with "return expit(x)" or "return .5 * (1+ than(.5 * x))". But neither of these changes works.
Do your have the same issue when your run the toolbox? And do you have any idea to solve it? Thx.
The details of the exception are shown as follows:
Pre Training Visible units: 2000 Hidden units: 500 /deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x)) /deep-belief-nets-for-topic-modeling/DBN/pretraining.py:140: RuntimeWarning: divide by zero encountered in log perplexity = nansum(vis * log(softmax_value)) /deep-belief-nets-for-topic-modeling/DBN/pretraining.py:140: RuntimeWarning: invalid value encountered in multiply perplexity = nansum(vis * log(softmax_value)) Bottom units: 500 Top units: 500 Epoch[ 1]: Error = 1.7385879 Bottom units: 500 Output units: 128 Epoch[ 1]: Error = 32.1944861 Time 71.8855669498 Fine Tuning Backprop: Epoch 1 Large batch: 1 of 36 /deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x)) /deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x)) /deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x))
Hi,
This happens because of numbers being too small. You’ll need to scale the data accordingly. But for most overflow warnings, they don’t have a real influence on the training.
Let me know if you have any more questions?
Best regards
Lars Maaløe PHD Student DTU Compute Technical University of Denmark (DTU)
Email: lars.maaloe@gmail.com, larsma@dtu.dk Phone: 0045 2229 1010 Skype: lars.maaloe LinkedIn http://dk.linkedin.com/in/larsmaaloe DTU Orbit http://orbit.dtu.dk/en/persons/lars-maaloee(0ba00555-e860-4036-9d7b-01ec1d76f96d).html
On 26 Feb 2015, at 17:32, jyb002 notifications@github.com wrote:
Hi Lars,
Thanks for your toolbox. I am trying to run your code, but it has the exception, which says "deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x))".
I googled solutions to fix it such as replacing the sigmoid function in dbn.py:241 with "return expit(x)" or "return .5 * (1+ than(.5 * x))". But neither of these changes works.
Do your have the same issue when your run the toolbox? And do you have any idea to solve it? Thx.
The details of the exception are shown as follows:
Pre Training Visible units: 2000 Hidden units: 500 /deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x)) /deep-belief-nets-for-topic-modeling/DBN/pretraining.py:140: RuntimeWarning: divide by zero encountered in log perplexity = nansum(vis * log(softmax_value)) /deep-belief-nets-for-topic-modeling/DBN/pretraining.py:140: RuntimeWarning: invalid value encountered in multiply perplexity = nansum(vis * log(softmax_value)) Bottom units: 500 Top units: 500 Epoch[ 1]: Error = 1.7385879 Bottom units: 500 Output units: 128 Epoch[ 1]: Error = 32.1944861 Time 71.8855669498 Fine Tuning Backprop: Epoch 1 Large batch: 1 of 36 /deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x)) /deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x)) /deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x))
— Reply to this email directly or view it on GitHub https://github.com/larsmaaloee/deep-belief-nets-for-topic-modeling/issues/1#issuecomment-76210585.
Hi Dr Lars, I would like to thank you for publishing your code . I am trying to run your code I am facing some issue 1) there is a problem with the unpickle that some of the dataset files does not work with it so I removed this files from the data set 2) The parallel Stemming does not work .. it creates the files but if I tried to open this files I just find a array of boolean values False .. I am using Windows 7 64
[False, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, False, False, <type 'exceptions.StopIteration'>, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, <type 'exceptions.StopIteration'>, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, False]
There are definitely some problems with the importing the newsgroup training data
1) Your code looks for files that end in ".p" however the newsgroup files are ".txt" files. 2) When you change the code to look for ".txt" files, there are still some pickling errors that occur with some files. 3) When you get rid of the files with pickling errors, the docs_list list in __set_attributes() contains all false values.
Have you tested this? Didn't you run into the ".p" problem?
Hi Alex,
Thanks for you interest in the toolbox.
The code is a little outdated, but there are no problems in running the code. The pickled files, are temporary lists of words, used for later BOW creation. You should not change the code to look for the .txt files. I believe what you are missing is the stemming. Please stem the files and then create the BOW, as is in the example code.
Let me know how it works. :)
Best regards
Lars Maaløe PHD Student Cognitive Systems, DTU Compute Technical University of Denmark (DTU)
Email: lars.maaloe@gmail.com, larsma@dtu.dk Phone: 0045 2229 1010 Skype: lars.maaloe LinkedIn http://dk.linkedin.com/in/larsmaaloe DTU Orbit http://orbit.dtu.dk/en/persons/lars-maaloee(0ba00555-e860-4036-9d7b-01ec1d76f96d).html
On 29 Sep 2015, at 02:18, Alex Minnaar notifications@github.com wrote:
There are definitely some problems with the importing the newsgroup training data
1) Your code looks for files that end in ".p" however the newsgroup files are ".txt" files. 2) When you change the code to look for ".txt" files, there are still some pickling errors that occur with some files. 3) When you get rid of the files with pickling errors, the docs_list list in __set_attributes() contains all false values.
Have you tested this? Didn't you run into the ".p" problem?
— Reply to this email directly or view it on GitHub https://github.com/larsmaaloee/deep-belief-nets-for-topic-modeling/issues/1#issuecomment-143908957.
Apologies. The problem was that I did not have nltk installed for the stemming. Strangely the error did not say that I did not have nltk, instead it seemed to just skip stemming altogether which is what created the error associated with not creating any ".p" files. It seems to be working now. Thanks!
Hi Lars, thanks for the toolbox. I am having a hard time getting it to run though. Is the main.py supposed to be working `as is' or only with modifications? I downloaded the data set, changed all formats to .txt but running it (on an IMac with 10.9.5) returns
Traceback (most recent call last): File "main.py", line 67, in
run_simulation('input/20news-bydate/20news-bydate-train','input/20news-bydate/20news-bydate-test',epochs = 50,attributes=2000,evaluation_points=[1,3,7,15,31,63],binary_output=True)
File "main.py", line 46, in run_simulation
dat_proc_train = data_processing.DataProcessing(train_paths,words_count=attributes,trainingset_size=1.0,acceptance_lst_path="input/acceptance_lst_stemmed.txt")
File "/Users/admin/Desktop/Deep-Belief-Nets-for-Topic-Modeling-master/DataPreparation/data_processing.py", line 42, in init
self.acceptance_lst = open(acceptance_lst_path).read().replace(" ","").split("\n")
IOError: [Errno 2] No such file or directory: 'input/acceptance_lst_stemmed.txt'
Removing the 'acceptance_lst_path' from `dat_proc_train = data_processing.DataProcessing...' (as in )results in
Traceback (most recent call last): File "main.py", line 67, in
run_simulation('input/20news-bydate/20news-bydate-train','input/20news-bydate/20news-bydate-test',epochs = 50,attributes=2000,evaluation_points=[1,3,7,15,31,63],binary_output=True)
File "main.py", line 52, in run_simulation
dat_proc_test = data_processing.DataProcessing(test_paths,trainingset_size=0.0, trainingset_attributes=data_processing.get_attributes())
File "/Users/admin/Desktop/Deep-Belief-Nets-for-Topic-Modeling-master/DataPreparation/data_processing.py", line 437, in get_attributes
return s.load( open( env_paths.get_attributes_path(training), "rb" ) )
IOError: [Errno 2] No such file or directory: 'output/train/BOWs/attributes.p'