Open beniz opened 8 years ago
This is a good example to start working on and to reproduce: https://github.com/fchollet/keras/blob/master/examples/imdb_bidirectional_lstm.py
Good link to get started with: http://christopher5106.github.io/deep/learning/2016/06/07/recurrent-neural-net-with-Caffe.html
Hi @beniz , I have gone through both of the tutorial, which you have listed above. Apart from these tutorial, I am also learning more about LSTM and RNN networks. In the meantime, can you tell me, the next steps for intergeneration of these networks with dd .
Sure @kyrs. I'd suggest as a first goal that you take the keras example and try to get similar results with DD. To do this, you may have to go through a few steps. First, you may want to write the required prototxt
files that describe the network with lstm units. Second, try to connect it to the existing character based input connector.
In other words onve you've got the prototxt
files, you should be able to use the examples on http://www.deepdetect.com/applications/text_model/ with lstm instead of CNN.
This is easier to say than do of course, and the input format to Caffe for lstm may not fit exactly the existing DD code, but based on your experiments, we'll fix that up.
Let me know how this sounds.
hi @beniz. I was trying to implement Christopher Bourez's blog( http://christopher5106.github.io/deep/learning/2016/06/07/recurrent-neural-net-with-Caffe.html ) on implementation of LSTM using caffe. Whole tutorial is based upon https://github.com/christopher5106/last_caffe_with_stn . I guess, caffe has officially added LSTM code in their master branch. https://github.com/BVLC/caffe/issues/4629 Do you think there is a need to integrate a new lib just for LSTM especially when caffe already provide such functionality. One more point I found literature about LSTM implementation in caffe a little incomplete. Do you have any resources to learn more about it ?? . This will be my first time coding with caffe
finally managed to find some example ... https://github.com/jeffdonahue/caffe/tree/recurrent-rebase-cleanup/examples/coco_caption .
What do you mean more exactly by integrating a new lib ? Ok I believe I understand: yes it'd be better to use the original Caffe lstm implementation. But if there are things that need change, I'll help or I'll do it. The custom version of Caffe we now use with DD has many improvements I've built in, so when it is the best solution, it's OK to do it.
hi @beniz , Just wanted to update you about my progress . It seems there is very less literature on using LSTM and RNN with caffee. Although, the pull request by jeffdonahue https://github.com/BVLC/caffe/pull/2033/files give some overview about it. I am trying to run the tutorial as is suggested in the pull request.
@kyrs OK thanks. Let me know if you need help, and reach out on gitter for instance. We can coordinate there.
@beniz as per our discussion I have modified my prototxt https://gist.github.com/kyrs/e93548079ab9954915122263cf845325 on the basis of the PR in https://github.com/beniz/deepdetect/pull/189 and have merged https://github.com/beniz/deepdetect/pull/174 in my forked branch. but on running the training process training process have thrown following error https://gist.github.com/kyrs/c9dc967bfd49e553cfed10668a4b19e4 .
following are my service creation and training requests
curl -X PUT "http://localhost:8888/services/ag" -d "{\"mllib\":\"caffe\",\"description\":\"newsgroup classification service\",\"type\":\"supervised\",\"parameters\":{\"input\":{\"connector\":\"txt\"},\"mllib\":{\"nclasses\":4,\"embedding\":true}},\"model\":{\"repository\":\"/home/shubham/openSource/deepdetect/models/agr_lstm/\"}}"
curl -X POST "http://localhost:8888/train" -d "{\"service\":\"ag\",\"async\":true,\"parameters\":{\"mllib\":{\"gpu\":true,\"solver\":{\"iterations\":50000,\"test_interval\":1000,\"base_lr\":0.01,\"solver_type\":\"ADAM\"},\"net\":{\"batch_size\":300}},\"input\":{\"sequence\":1024,\"embedding\":true,\"shuffle\":true,\"test_split\":0.2,\"min_count\":10,\"min_word_length\":5,\"count\":false},\"output\":{\"measure\":[\"mcll\",\"f1\"]}},\"data\":[\"/home/shubham/openSource/deepdetect/models/data/agnews_data/\"]}"
Caffe documentation and examples are a mess, but the comment that explains the required inputs to the RNN and LSTM recurrent layers is here: https://github.com/BVLC/caffe/pull/2033#issue-59849829
As expected this requires modifying the way DD produces the inputs, in order to fit these requirements. I'll make more comments and post potential code to help with this.
found this interesting document, which explain lstm integration with caffe in more detail. http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-sequences.pdf
hi @beniz I am experimenting with dd to run LSTM. I need your suggestion on few stuff, In the example given for training LSTM in caffe the value of \delta is explicitly created and stored in hdf5 format before training or testing the LSTM network. see https://github.com/BVLC/caffe/pull/2033/files#diff-c912186cd39ea15b5646c3b2f5350a7eR105 and https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR193. Do you think that user should formulate the values of this \delta based on his data and provide it in .prototxt before training or testing or the binary value of this delta should be filled/created in caffe during training or testing based on the batch data.
\delta
should be put into the Datum
before storage as LMDB, in CaffeInputConn.h
, much like the other decompositions. I've put it all on paper, it should not be long to implement. You can do it if you like, by looking at the way the characters or words in sentences are converted into Datum
, still in CaffeInputConn
. The existing code can serve as support for implementing storage of padded sentences with \delta
.
hi @beniz, I have few doubts regarding modification in caffeInputConn.h
.
In the file https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L673 , are you converting a full sentence/sequence into datum
? . Also as per my understanding, I need to create a separate lmdb
file for storing \delta
in datum format and develop an one to one mapping with the main lmdb file i.e. train.lmdb
& test.lmdb
(https://github.com/BVLC/caffe/issues/1381).
Finally I have started to get hang of all these terms. Please, correct me if I am going in wrong direction.
hi @kyrs ! yes, to_datum
converts one hot word or char vector sequences to Caffe Datum.
You can write two lmdb files, they will be synced if you write the entries in the same order. An alternative that DD already uses elsewhere is to put data and deltas into a single Datum and to slice the resting Blob accordingly when running. This requires adding a Slice layer after the data layer If you look at multiple target regression models in DD, they already use the Slice layer. This post https://groups.google.com/forum/m/#!topic/caffe-users/RuT1TgwiRCo can help you with the slicing if you choose this route.
Let me know how this goes!
hi @beniz if you look into the files for generating \delta
https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR193 you can easily see that all the padded words have a value 0. But for our use case how can we find the index of padded word in https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L673 .
Also, can I also assume that starting index of the vector hit
marks the start of a new sequence ?
hi @beniz if you look into the files for generating \delta https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR193 you can easily see that all the padded words have a value 0. But for our use case how can we find the index of padded word in https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L673 .
We are padding to, see https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L786 We fill up the whole sequence with zeros, then fill what we can.
Also, can I also assume that starting index of the vector hit marks the start of a new sequence ?
Yes, hit
holds a sequence (i.e. a sentence).
Let me know if this helps.
I think that current method of padding the whole sequence with zeros and filling it with appropriate values doesn't preserve the ordering of a words in a given sentence. For LSTM the ordering of word is also important . if you look into https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR170 the author have appended words in a sequential manner, which clearly preserve the word ordering.
what do you think about this ?. I guess we have to change the format in which words are being stored in datum
. I have few stuff in mind but before implementing I need to discuss it with you.
You can change the format, but you could also use the characters instead of words to play with the LSTM.
I have made some changes in caffeinputconns.h file to integrate it with lstm . Although, I have managed to build it properly, but still I am sceptical about my method. what do you think about it ? https://gist.github.com/kyrs/a1b1065c7bfd92ea48c56f66607b1d0a
I'm not sure why you are calling to_datum
before filling up the Datum
. Actually I believe the code should be executed within to_datum
, though I may have missed something.
I am following the multi label classification example https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L400 to understand the slicing for \delta
file. Also, I didn't wanted to change the code of to_datum
unless I am confident with my process. If you think I should make changes in to_datum
file then I will do it today and try to train the model on a sample dataset.
Yes you can change to_datum
, otherwise you may get weird results by letting the datums being filled up before your code runs.
Slicing is not difficult: just append the deltas
after the fixed length sentence (use padding for fixed length as necessary). When running the model, a Slice
layer separates the sentence from the deltas, that's it.
The fixed length can be relaxed later on, there's no need to try the most complex setup first. Let me know how it goes!
@beniz is it possible to slice a datum
based on width
rather than slicing it on channel ??. If you look in character based encoding of text https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L793. We may not need to pad. As the length of _alphabet
is already fixed fixed.https://github.com/beniz/deepdetect/blob/master/src/txtinputfileconn.cc#L353
You can slice in any dimension you want, even multiple times.
The current padding for character does preserve order. The one for words does not since it is a bag of word model. But you could build one that has ordered words. To begin with you might want to try LSTM on ordered characters and thus play with only minimal changes to the existing code.
Sent from BlueMail
On Oct 22, 2016, 10:59, at 10:59, Kumar Shubham notifications@github.com wrote:
I think that current method of padding the whole sequence with zeros and filling it with appropriate values doesn't preserve the ordering of a words in a given sentence. For LSTM the ordering of word is also important . if you look into https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR170 the author have appended words in a sequential manner, which clearly preserve the word ordering.
what do you think about this ?. Do you have anything in mind to do this ?
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/beniz/deepdetect/issues/140#issuecomment-255516344
sure, I am making changes in the code for character based LSTM prediction. Will soon update you with the results.
I have created a PR with little modification https://github.com/beniz/deepdetect/pull/208. I think these changes will work. what do you say ? As per next step I am creating deploy.prototxt
file with necessary slicing to run it on AG News
data.
hi @kyrs, best is to PR once you know that it works :) Have you tried training on an example ? The IMDb dataset would be a good one to use!
Oops my fault !!! I just wanted to share code with you that's why I created it. if you say so, I will close it for now .. until I tested it completely.
since you must have pushes it onto your branch, just point me to the branch :) I'll take a look at it tomorrow!
You can see the changes in https://github.com/kyrs/deepdetect/tree/lstm_140/
I have created .prototxt
https://gist.github.com/kyrs/86021a67b82c34513cffe6e839bcbf7b
file for AG News
data based on the changes I have made in the local branch.
But, when I tried to test the changes, I got stuck on some issue. Although, I am able to launch the service and start training process with 200
status. But when I try to check the training status, I am again getting Error: service ag training status call failed
.
run the job with async:false
and then you need to investigate
Hey guys, exciting to see if LSTM support is possible within DeepDetect also. Did you ever reach a conclusion from your tests in October?
Cheers, Hakim
@hakimkhalafi it hasn't been fulfilled yet. Although we have all the pieces lying on the table, don't expect LSTM support within DD before a few months unless it gets sponsored by one of our customers. Interestingly, the demand for LSTM has been very high. What is the application you are contemplating at the moment if you can share ?
Hi @beniz,
We wanted to implement a CNN + LSTM model. Here we have multiple images and each image is fed to the same CNN, then the fixed size vector output of each image is fed into an LSTM. Would you be knowing about any resources/link/etc which could help in implementing that?
Thank You
Hi @divamgupta DD does not directly support input LSTM layer for production but this should not affect you with images and CNN as first layer, though you may need to re-arrange your inputs.
If you already have the Caffe network defined (e.g. prototxt
), you could pass the aggregated images as input, then split them with a Split
layer and feed them to your CNN + LSTM.
Join the gitter chat rather than discussing these details here.
Hi all,
How does one actually use the LSTM layer. I keep getting errors, saying certain parameters are invalid
layer {
name: "lstm1"
type: "Lstm"
bottom: "data"
bottom: "clip"
top: "lstm1"
RecurrentParameter {
num_output: 15
clipping_threshold: 0.1
weight_filler {
type: "gaussian"
std: 0.1
}
bias_filler {
type: "constant"
}
}
}
Open an issue and report all requested information, and let's start from there. Thanks.
Hi,all, when I train LSTM network, error appear" Message type "caffe.LayerParameter" has no field named "lstm_param""?
I have installed the latest caffe version from the master branch. According to my knowledge Caffe now supports LSTM layers. But when I run the solver I get this error.
lstm.prototxt is :
input: "data"
input_shape { dim: 320 dim: 1 }
input: "clip"
input_shape { dim: 320 dim: 1 }
input: "label"
input_shape { dim: 320 dim: 1 }
layer {
name: "Silence"
type: "Silence"
bottom: "label"
include: { phase: TEST }
}
layer {
name: "lstm1"
type: "Lstm"
bottom: "data"
bottom: "clip"
top: "lstm1"
lstm_param {
num_output: 15
clipping_threshold: 0.1
weight_filler {
type: "gaussian"
std: 0.1
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "lstm1"
top: "ip1"
inner_product_param {
num_output: 1
weight_filler {
type: "gaussian"
std: 0.1
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "ip1"
bottom: "label"
top: "loss"
include: { phase: TRAIN }
}
#my "solver.prototxt" is:
`net: "lstm.prototxt"
test_iter: 1
test_interval: 2000000
base_lr: 0.0001
momentum: 0.95
lr_policy: "fixed"
display: 200
max_iter: 100000
solver_mode: GPU
average_loss: 200
#debug_info: true`
You should post this on Caffe issues, you are obviously not using dd.
RNN + LSTM support now merged into Caffe, https://github.com/BVLC/caffe/pull/3948. This paves the way for robust integration within dd.