Reproduce "Text understanding from scratch" neural model with a notebook - Githubissues

shogun-toolbox / shogun

Shōgun

http://shogun-toolbox.org

BSD 3-Clause "New" or "Revised" License

3.03k stars 1.04k forks source link

Reproduce "Text understanding from scratch" neural model with a notebook #2701

Open lisitsyn opened 9 years ago

lisitsyn commented 9 years ago

Paper is here

Good for Deep learning applicants

ayush2913 commented 9 years ago

I want to work on the implementation of this paper. @lisitsyn can you please guide me how to start?

lisitsyn commented 9 years ago

@ayush2913 cool! This could be a kind of challenging thing so lets do some iterations. If you understand the architecture of neural net they used we can go on to details otherwise lets discuss

ayush2913 commented 9 years ago

@lisitsyn I gave a read to the paper and watched a video regarding convolution neural networks. I understood that they have used ConvNets in which they are using 1-D convolution and Max pooling to form neural networks. And they have used Stochastic Gradient Descend Algorithm to train it. Further they have encoded the input into vectors by using 1 of m encoding. I have got an overview of the system. But I still am trying to figure out how exactly convolution and max pooling is used in formation of neural network. I am also looking in detail about the way of encoding of data in the paper. Can u give me some idea about these things? Or maybe something to refer?

lisitsyn commented 9 years ago

@ayush2913 I'd suggest you to start with some helper functions that encode the text in the same fashion they do. The next step would be to check the notebook on CNNs http://shogun-toolbox.org/static/notebook/current/neuralnets_digits.html and try to build some network. Then you would transform a few texts and try to learn the network to classify them.

alishir commented 9 years ago

@lisitsyn As I found in this blog post @zhangxiangxiao promised to release the Torch sourced code (GPU) in a few weeks.

lisitsyn commented 9 years ago

@alishir cool! Thanks for the info

sanuj commented 9 years ago

hey @lisitsyn I would like to work on this. I have a basic understanding of CONV layers and POOL layers and particularly Convnets used on image classification. This paper proposes to use Temporal Conv Net which I'm not quite familiar with. I think they are used for video recognition as videos have a temporal dimension unlike images. For character quantization, they use 1 of N encoding with alphabet consisting of 69 characters i.e. codeword for a character would be somewhat like 00010000....0000(69 bit long), and so on. This explains the 69 frames in the input. In the Convolutional Layer table they have columns for LargeFrame, SmallFrame, Kernel and Pool. What exactly do they mean? Then they do data augmentation by replacing words by their respective synonyms. So finally, what is the output of the neural net over here which would lead to text understanding from scratch? I haven't used nlp tools like word2vec etc before.

lisitsyn commented 9 years ago

Hey @sanuj,

temporal here is just in the axis of letters coming in the text. So t=0 is the first letter and so on. This encoding is treated as an image so convolutional layers operate in the same fashion as in images (the pooling step is exactly the same). The output is domain-dependent, you just fit this architecture to the problem of classification of texts or something like that.

Please ask if you have other questions

sanuj commented 9 years ago

@lisitsyn

How to decide the dimension of the input image? Is it LxM (in context of the paper) and the dots in the binary image represent the 1s in the codewords (Right?). If there are m pixels in a column then only one can be white because of the 1 of m encoding(00000100....0000 - m bits). But that's not the case as the images look like Braille so I'm mistaken somewhere.
What is kernel, large frame, small frame in the conv layer table. It also says that conv layer do not use stride, What does that mean? I think you have stride of 1 or so normally for convolution.
How do we compare the output, whether it is correct or not. Will we do data augmentation in the paper? and will our output be domain-dependent, like in the paper or should we do it for a specific case for more clearity?

sanuj commented 9 years ago

@lisitsyn sergey?

lisitsyn commented 9 years ago

@sanuj really sorry, things going a bit intense

1) Number of unique chars in your alphabet times number of chars in your sequence 2) Kernel is what convolution layer uses and learns. That's coefficients of the convolutional operator. The stride is 1, so they don't jump over more than one letter 3) This should be some specific task like sentiment analysis or something like that

sanuj commented 9 years ago

@lisitsyn No problem :) I'll have a look at the other notebooks related to neural nets and start working on this. Will get back to you if i get stuck somewhere ;)

sanuj commented 9 years ago

@lisitsyn Sorry for disappearing, got busy with an intern. I looked at neural nets digits ipython notebook. I'll start with writing helper functions in python to encode text as mentioned in the paper and then try to build some network after that.

sanuj commented 9 years ago

@lisitsyn this will be helpful. https://github.com/zhangxiangxiao/Crepe

sanuj commented 8 years ago

@lisitsyn I trained Crepe(implementation of this using torch) on DBPedia ontology classification and it almost took a day on a CUDA enabled GPU. Shall I do the same thing in the notebook or reduce the number of parameters/layers? Because a shogun implementation will take more time to train.

lisitsyn commented 8 years ago

@sanuj my idea was to create an ipython notebook reproducing that in shogun. If it takes that much time we should either train on subset, or do something about it in shogun, or maybe just give up :)

karlnapf commented 8 years ago

Suggestion: Train on subset in notebook. State that it takes much longer if done on full data. Put a dropbox or similar link to download the trained thing and allow to use the in notebook if available

amit309b commented 7 years ago

I would like to work on this issue. I understand the architecture of neural nets and also read the paper given below. Please @lisitsyn give me guidelines what to do next.

karlnapf commented 7 years ago

@amit309b there wont be any guidelines what to do next, we require self-initiative. Also see above discussion

amit309b commented 7 years ago

@karlnapf I tried to reproduce above given paper in ipython notebook. first i convert alphabet data into 1 of m encoding according to written in the paper. In my code i skip the data-augmentation part for implementation purpose i use the small frame and my dataset is from AG_corpus_news_articles. here is below link for code https://gist.github.com/amit309b/7c072b97be281a29049d201f3b3888e3

karlnapf commented 7 years ago

@amit309b please read on how to share ipython notebooks

amit309b commented 7 years ago

sorry for wrong format, @karlnapf here is ipython notebook link: https://gist.github.com/amit309b/b3417128326b7ca0da5e21b88b6e6711

amit309b commented 7 years ago

@karlnapf please give some comment on above code

karlnapf commented 7 years ago

This seems OK, but it is like 1% of the task.

amit309b commented 7 years ago

@karlnapf Then what to do next can you suggest me what is my next task as from above discussion I understand that we have to just write the ipython notebook.