Open mengpengfei opened 4 years ago
Hi, we don't use bilstm and haven't tested it. We do use the resnet18+lstm+ctc. Our understanding is that theoretically a unidirectional lstm should do the work (and does) for most of our applications.
You can find examples of Bi-LSTM with Caffe here: https://github.com/deepsemantic/image_captioning/blob/master/examples/flickr8K/multi_Bi_LSTM.prototxt
As you may see, this requires doubling the data input databases (forward and backward sentences). This can be achieved with DeepDetect too, though you may need to produce the databases externally.
have you test the resnet18+ctc+blstm net?