jolibrain / deepdetect

Deep Learning API and Server in C++14 support for Caffe, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE
https://www.deepdetect.com/
Other
2.52k stars 560 forks source link

Recurrent neural layers support (RNN, LSTM) via Caffe backend (Direct inputs to LSTM) #140

Open beniz opened 8 years ago

beniz commented 8 years ago

RNN + LSTM support now merged into Caffe, https://github.com/BVLC/caffe/pull/3948. This paves the way for robust integration within dd.

beniz commented 8 years ago

This is a good example to start working on and to reproduce: https://github.com/fchollet/keras/blob/master/examples/imdb_bidirectional_lstm.py

beniz commented 8 years ago

Good link to get started with: http://christopher5106.github.io/deep/learning/2016/06/07/recurrent-neural-net-with-Caffe.html

kyrs commented 8 years ago

Hi @beniz , I have gone through both of the tutorial, which you have listed above. Apart from these tutorial, I am also learning more about LSTM and RNN networks. In the meantime, can you tell me, the next steps for intergeneration of these networks with dd .

beniz commented 8 years ago

Sure @kyrs. I'd suggest as a first goal that you take the keras example and try to get similar results with DD. To do this, you may have to go through a few steps. First, you may want to write the required prototxt files that describe the network with lstm units. Second, try to connect it to the existing character based input connector.

In other words onve you've got the prototxt files, you should be able to use the examples on http://www.deepdetect.com/applications/text_model/ with lstm instead of CNN.

This is easier to say than do of course, and the input format to Caffe for lstm may not fit exactly the existing DD code, but based on your experiments, we'll fix that up.

Let me know how this sounds.

kyrs commented 8 years ago

hi @beniz. I was trying to implement Christopher Bourez's blog( http://christopher5106.github.io/deep/learning/2016/06/07/recurrent-neural-net-with-Caffe.html ) on implementation of LSTM using caffe. Whole tutorial is based upon https://github.com/christopher5106/last_caffe_with_stn . I guess, caffe has officially added LSTM code in their master branch. https://github.com/BVLC/caffe/issues/4629 Do you think there is a need to integrate a new lib just for LSTM especially when caffe already provide such functionality. One more point I found literature about LSTM implementation in caffe a little incomplete. Do you have any resources to learn more about it ?? . This will be my first time coding with caffe

kyrs commented 8 years ago

finally managed to find some example ... https://github.com/jeffdonahue/caffe/tree/recurrent-rebase-cleanup/examples/coco_caption .

beniz commented 8 years ago

What do you mean more exactly by integrating a new lib ? Ok I believe I understand: yes it'd be better to use the original Caffe lstm implementation. But if there are things that need change, I'll help or I'll do it. The custom version of Caffe we now use with DD has many improvements I've built in, so when it is the best solution, it's OK to do it.

kyrs commented 8 years ago

hi @beniz , Just wanted to update you about my progress . It seems there is very less literature on using LSTM and RNN with caffee. Although, the pull request by jeffdonahue https://github.com/BVLC/caffe/pull/2033/files give some overview about it. I am trying to run the tutorial as is suggested in the pull request.

beniz commented 8 years ago

@kyrs OK thanks. Let me know if you need help, and reach out on gitter for instance. We can coordinate there.

kyrs commented 8 years ago

@beniz as per our discussion I have modified my prototxt https://gist.github.com/kyrs/e93548079ab9954915122263cf845325 on the basis of the PR in https://github.com/beniz/deepdetect/pull/189 and have merged https://github.com/beniz/deepdetect/pull/174 in my forked branch. but on running the training process training process have thrown following error https://gist.github.com/kyrs/c9dc967bfd49e553cfed10668a4b19e4 . following are my service creation and training requests curl -X PUT "http://localhost:8888/services/ag" -d "{\"mllib\":\"caffe\",\"description\":\"newsgroup classification service\",\"type\":\"supervised\",\"parameters\":{\"input\":{\"connector\":\"txt\"},\"mllib\":{\"nclasses\":4,\"embedding\":true}},\"model\":{\"repository\":\"/home/shubham/openSource/deepdetect/models/agr_lstm/\"}}"

curl -X POST "http://localhost:8888/train" -d "{\"service\":\"ag\",\"async\":true,\"parameters\":{\"mllib\":{\"gpu\":true,\"solver\":{\"iterations\":50000,\"test_interval\":1000,\"base_lr\":0.01,\"solver_type\":\"ADAM\"},\"net\":{\"batch_size\":300}},\"input\":{\"sequence\":1024,\"embedding\":true,\"shuffle\":true,\"test_split\":0.2,\"min_count\":10,\"min_word_length\":5,\"count\":false},\"output\":{\"measure\":[\"mcll\",\"f1\"]}},\"data\":[\"/home/shubham/openSource/deepdetect/models/data/agnews_data/\"]}"

beniz commented 8 years ago

Caffe documentation and examples are a mess, but the comment that explains the required inputs to the RNN and LSTM recurrent layers is here: https://github.com/BVLC/caffe/pull/2033#issue-59849829

As expected this requires modifying the way DD produces the inputs, in order to fit these requirements. I'll make more comments and post potential code to help with this.

kyrs commented 7 years ago

found this interesting document, which explain lstm integration with caffe in more detail. http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-sequences.pdf

kyrs commented 7 years ago

hi @beniz I am experimenting with dd to run LSTM. I need your suggestion on few stuff, In the example given for training LSTM in caffe the value of \delta is explicitly created and stored in hdf5 format before training or testing the LSTM network. see https://github.com/BVLC/caffe/pull/2033/files#diff-c912186cd39ea15b5646c3b2f5350a7eR105 and https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR193. Do you think that user should formulate the values of this \delta based on his data and provide it in .prototxt before training or testing or the binary value of this delta should be filled/created in caffe during training or testing based on the batch data.

beniz commented 7 years ago

\delta should be put into the Datum before storage as LMDB, in CaffeInputConn.h, much like the other decompositions. I've put it all on paper, it should not be long to implement. You can do it if you like, by looking at the way the characters or words in sentences are converted into Datum, still in CaffeInputConn. The existing code can serve as support for implementing storage of padded sentences with \delta.

kyrs commented 7 years ago

hi @beniz, I have few doubts regarding modification in caffeInputConn.h. In the file https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L673 , are you converting a full sentence/sequence into datum ? . Also as per my understanding, I need to create a separate lmdb file for storing \delta in datum format and develop an one to one mapping with the main lmdb file i.e. train.lmdb & test.lmdb (https://github.com/BVLC/caffe/issues/1381).

Finally I have started to get hang of all these terms. Please, correct me if I am going in wrong direction.

beniz commented 7 years ago

hi @kyrs ! yes, to_datum converts one hot word or char vector sequences to Caffe Datum.

You can write two lmdb files, they will be synced if you write the entries in the same order. An alternative that DD already uses elsewhere is to put data and deltas into a single Datum and to slice the resting Blob accordingly when running. This requires adding a Slice layer after the data layer If you look at multiple target regression models in DD, they already use the Slice layer. This post https://groups.google.com/forum/m/#!topic/caffe-users/RuT1TgwiRCo can help you with the slicing if you choose this route.

Let me know how this goes!

kyrs commented 7 years ago

hi @beniz if you look into the files for generating \delta https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR193 you can easily see that all the padded words have a value 0. But for our use case how can we find the index of padded word in https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L673 . Also, can I also assume that starting index of the vector hit marks the start of a new sequence ?

beniz commented 7 years ago

hi @beniz if you look into the files for generating \delta https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR193 you can easily see that all the padded words have a value 0. But for our use case how can we find the index of padded word in https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L673 .

We are padding to, see https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L786 We fill up the whole sequence with zeros, then fill what we can.

Also, can I also assume that starting index of the vector hit marks the start of a new sequence ?

Yes, hit holds a sequence (i.e. a sentence).

Let me know if this helps.

kyrs commented 7 years ago

I think that current method of padding the whole sequence with zeros and filling it with appropriate values doesn't preserve the ordering of a words in a given sentence. For LSTM the ordering of word is also important . if you look into https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR170 the author have appended words in a sequential manner, which clearly preserve the word ordering.

what do you think about this ?. I guess we have to change the format in which words are being stored in datum. I have few stuff in mind but before implementing I need to discuss it with you.

beniz commented 7 years ago

You can change the format, but you could also use the characters instead of words to play with the LSTM.

kyrs commented 7 years ago

I have made some changes in caffeinputconns.h file to integrate it with lstm . Although, I have managed to build it properly, but still I am sceptical about my method. what do you think about it ? https://gist.github.com/kyrs/a1b1065c7bfd92ea48c56f66607b1d0a

beniz commented 7 years ago

I'm not sure why you are calling to_datum before filling up the Datum. Actually I believe the code should be executed within to_datum, though I may have missed something.

kyrs commented 7 years ago

I am following the multi label classification example https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L400 to understand the slicing for \delta file. Also, I didn't wanted to change the code of to_datum unless I am confident with my process. If you think I should make changes in to_datum file then I will do it today and try to train the model on a sample dataset.

beniz commented 7 years ago

Yes you can change to_datum, otherwise you may get weird results by letting the datums being filled up before your code runs.

Slicing is not difficult: just append the deltas after the fixed length sentence (use padding for fixed length as necessary). When running the model, a Slice layer separates the sentence from the deltas, that's it.

The fixed length can be relaxed later on, there's no need to try the most complex setup first. Let me know how it goes!

kyrs commented 7 years ago

@beniz is it possible to slice a datum based on width rather than slicing it on channel ??. If you look in character based encoding of text https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L793. We may not need to pad. As the length of _alphabet is already fixed fixed.https://github.com/beniz/deepdetect/blob/master/src/txtinputfileconn.cc#L353

beniz commented 7 years ago

You can slice in any dimension you want, even multiple times.

beniz commented 7 years ago

The current padding for character does preserve order. The one for words does not since it is a bag of word model. But you could build one that has ordered words. To begin with you might want to try LSTM on ordered characters and thus play with only minimal changes to the existing code.

Sent from BlueMail

On Oct 22, 2016, 10:59, at 10:59, Kumar Shubham notifications@github.com wrote:

I think that current method of padding the whole sequence with zeros and filling it with appropriate values doesn't preserve the ordering of a words in a given sentence. For LSTM the ordering of word is also important . if you look into https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR170 the author have appended words in a sequential manner, which clearly preserve the word ordering.

what do you think about this ?. Do you have anything in mind to do this ?

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/beniz/deepdetect/issues/140#issuecomment-255516344

kyrs commented 7 years ago

sure, I am making changes in the code for character based LSTM prediction. Will soon update you with the results.

kyrs commented 7 years ago

I have created a PR with little modification https://github.com/beniz/deepdetect/pull/208. I think these changes will work. what do you say ? As per next step I am creating deploy.prototxt file with necessary slicing to run it on AG News data.

beniz commented 7 years ago

hi @kyrs, best is to PR once you know that it works :) Have you tried training on an example ? The IMDb dataset would be a good one to use!

kyrs commented 7 years ago

Oops my fault !!! I just wanted to share code with you that's why I created it. if you say so, I will close it for now .. until I tested it completely.

beniz commented 7 years ago

since you must have pushes it onto your branch, just point me to the branch :) I'll take a look at it tomorrow!

kyrs commented 7 years ago

You can see the changes in https://github.com/kyrs/deepdetect/tree/lstm_140/

kyrs commented 7 years ago

I have created .prototxt https://gist.github.com/kyrs/86021a67b82c34513cffe6e839bcbf7b file for AG News data based on the changes I have made in the local branch.

But, when I tried to test the changes, I got stuck on some issue. Although, I am able to launch the service and start training process with 200 status. But when I try to check the training status, I am again getting Error: service ag training status call failed.

beniz commented 7 years ago

run the job with async:false and then you need to investigate

hakimkhalafi commented 7 years ago

Hey guys, exciting to see if LSTM support is possible within DeepDetect also. Did you ever reach a conclusion from your tests in October?

Cheers, Hakim

beniz commented 7 years ago

@hakimkhalafi it hasn't been fulfilled yet. Although we have all the pieces lying on the table, don't expect LSTM support within DD before a few months unless it gets sponsored by one of our customers. Interestingly, the demand for LSTM has been very high. What is the application you are contemplating at the moment if you can share ?

divamgupta commented 6 years ago

Hi @beniz,

We wanted to implement a CNN + LSTM model. Here we have multiple images and each image is fed to the same CNN, then the fixed size vector output of each image is fed into an LSTM. Would you be knowing about any resources/link/etc which could help in implementing that?

Thank You

beniz commented 6 years ago

Hi @divamgupta DD does not directly support input LSTM layer for production but this should not affect you with images and CNN as first layer, though you may need to re-arrange your inputs.

If you already have the Caffe network defined (e.g. prototxt), you could pass the aggregated images as input, then split them with a Split layer and feed them to your CNN + LSTM.

Join the gitter chat rather than discussing these details here.

soulslicer commented 6 years ago

Hi all,

How does one actually use the LSTM layer. I keep getting errors, saying certain parameters are invalid

layer {
  name: "lstm1"
  type: "Lstm"
  bottom: "data"
  bottom: "clip"
  top: "lstm1"

  RecurrentParameter {
    num_output: 15
    clipping_threshold: 0.1
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}
beniz commented 6 years ago

Open an issue and report all requested information, and let's start from there. Thanks.

cuixing158 commented 6 years ago

Hi,all, when I train LSTM network, error appear" Message type "caffe.LayerParameter" has no field named "lstm_param""?

I have installed the latest caffe version from the master branch. According to my knowledge Caffe now supports LSTM layers. But when I run the solver I get this error.

lstm.prototxt is :


input: "data"
input_shape { dim: 320 dim: 1 }
input: "clip"
input_shape { dim: 320 dim: 1 }
input: "label"
input_shape { dim: 320 dim: 1 }
layer {
  name: "Silence"
  type: "Silence"
  bottom: "label"
  include: { phase: TEST }
}
layer {
  name: "lstm1"
  type: "Lstm"
  bottom: "data"
  bottom: "clip"
  top: "lstm1"

  lstm_param {
    num_output: 15
    clipping_threshold: 0.1
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "lstm1"
  top: "ip1"

  inner_product_param {
    num_output: 1
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "ip1"
  bottom: "label"
  top: "loss"
  include: { phase: TRAIN }
}

#my "solver.prototxt" is:
`net: "lstm.prototxt"
test_iter: 1
test_interval: 2000000
base_lr: 0.0001
momentum: 0.95
lr_policy: "fixed"
display: 200
max_iter: 100000
solver_mode: GPU
average_loss: 200
#debug_info: true`
beniz commented 6 years ago

You should post this on Caffe issues, you are obviously not using dd.