wanji / caffe-sl

caffe for Similarity Learning
Other
83 stars 53 forks source link

LMDB support in TripletImageDataLayer #6

Closed ergysr closed 8 years ago

ergysr commented 8 years ago

I have been using the TripletImageData layer for training and testing and it works fine on the mnist_sl example. However, when training with a bigger network and with a large number of images the TripletImageData layer creates an IO bottleneck that makes training slow. Is there already a way to convert triplet list files into lmdb databases and use those instead of the text files that point to images on disk?

wanji commented 8 years ago

As the triplets are random combination of images, it requires a huge amount of disk space for storing all the triplet input data. Usually the data layer should not be an bottleneck because of the data prefetching mechanism. If you are sure that this layer creates the IO bottleneck, may be you can preload all the images and generate triplet input data in memory. Alternatively, you can create small batches as mentioned in FaceNet, that should be more suitable for LMDB databases.