Open lisitsyn opened 9 years ago
I want to work on the implementation of this paper. @lisitsyn can you please guide me how to start?
@ayush2913 cool! This could be a kind of challenging thing so lets do some iterations. If you understand the architecture of neural net they used we can go on to details otherwise lets discuss
@lisitsyn I gave a read to the paper and watched a video regarding convolution neural networks. I understood that they have used ConvNets in which they are using 1-D convolution and Max pooling to form neural networks. And they have used Stochastic Gradient Descend Algorithm to train it. Further they have encoded the input into vectors by using 1 of m encoding. I have got an overview of the system. But I still am trying to figure out how exactly convolution and max pooling is used in formation of neural network. I am also looking in detail about the way of encoding of data in the paper. Can u give me some idea about these things? Or maybe something to refer?
@ayush2913 I'd suggest you to start with some helper functions that encode the text in the same fashion they do. The next step would be to check the notebook on CNNs http://shogun-toolbox.org/static/notebook/current/neuralnets_digits.html and try to build some network. Then you would transform a few texts and try to learn the network to classify them.
@lisitsyn As I found in this blog post @zhangxiangxiao promised to release the Torch sourced code (GPU) in a few weeks.
@alishir cool! Thanks for the info
hey @lisitsyn I would like to work on this. I have a basic understanding of CONV layers and POOL layers and particularly Convnets used on image classification. This paper proposes to use Temporal Conv Net which I'm not quite familiar with. I think they are used for video recognition as videos have a temporal dimension unlike images. For character quantization, they use 1 of N encoding with alphabet consisting of 69 characters i.e. codeword for a character would be somewhat like 00010000....0000(69 bit long), and so on. This explains the 69 frames in the input. In the Convolutional Layer table they have columns for LargeFrame, SmallFrame, Kernel and Pool. What exactly do they mean? Then they do data augmentation by replacing words by their respective synonyms. So finally, what is the output of the neural net over here which would lead to text understanding from scratch? I haven't used nlp tools like word2vec etc before.
Hey @sanuj,
temporal here is just in the axis of letters coming in the text. So t=0 is the first letter and so on. This encoding is treated as an image so convolutional layers operate in the same fashion as in images (the pooling step is exactly the same). The output is domain-dependent, you just fit this architecture to the problem of classification of texts or something like that.
Please ask if you have other questions
@lisitsyn
conv layer do not use stride
, What does that mean? I think you have stride of 1 or so normally for convolution.@lisitsyn sergey?
@sanuj really sorry, things going a bit intense
1) Number of unique chars in your alphabet times number of chars in your sequence 2) Kernel is what convolution layer uses and learns. That's coefficients of the convolutional operator. The stride is 1, so they don't jump over more than one letter 3) This should be some specific task like sentiment analysis or something like that
@lisitsyn No problem :) I'll have a look at the other notebooks related to neural nets and start working on this. Will get back to you if i get stuck somewhere ;)
@lisitsyn Sorry for disappearing, got busy with an intern. I looked at neural nets digits ipython notebook. I'll start with writing helper functions in python to encode text as mentioned in the paper and then try to build some network after that.
@lisitsyn this will be helpful. https://github.com/zhangxiangxiao/Crepe
@lisitsyn I trained Crepe(implementation of this using torch) on DBPedia ontology classification and it almost took a day on a CUDA enabled GPU. Shall I do the same thing in the notebook or reduce the number of parameters/layers? Because a shogun implementation will take more time to train.
@sanuj my idea was to create an ipython notebook reproducing that in shogun. If it takes that much time we should either train on subset, or do something about it in shogun, or maybe just give up :)
Suggestion: Train on subset in notebook. State that it takes much longer if done on full data. Put a dropbox or similar link to download the trained thing and allow to use the in notebook if available
I would like to work on this issue. I understand the architecture of neural nets and also read the paper given below. Please @lisitsyn give me guidelines what to do next.
@amit309b there wont be any guidelines what to do next, we require self-initiative. Also see above discussion
@karlnapf I tried to reproduce above given paper in ipython notebook. first i convert alphabet data into 1 of m encoding according to written in the paper. In my code i skip the data-augmentation part for implementation purpose i use the small frame and my dataset is from AG_corpus_news_articles. here is below link for code https://gist.github.com/amit309b/7c072b97be281a29049d201f3b3888e3
@amit309b please read on how to share ipython notebooks
sorry for wrong format, @karlnapf here is ipython notebook link: https://gist.github.com/amit309b/b3417128326b7ca0da5e21b88b6e6711
@karlnapf please give some comment on above code
This seems OK, but it is like 1% of the task.
@karlnapf Then what to do next can you suggest me what is my next task as from above discussion I understand that we have to just write the ipython notebook.
Paper is here
Good for Deep learning applicants