lim0606 / caffe-googlenet-bn

re-implementation of googlenet batch normalization
131 stars 77 forks source link

question about lmdb shuffle #5

Closed mdqyy116 closed 8 years ago

mdqyy116 commented 8 years ago

Nice job. I have two quesions about shuffling the data stored in lmdb/leveldb.

  1. The data reading performance whether will affect by shuffling.
  2. Does this shuffling manner support with leveldb.

Thanks.

lim0606 commented 8 years ago

@mdqyy116

  1. The data reading performance whether will affect by shuffling.

Yes, it affects somewhat. Since lmdb ordered-key value, it only allows us to read the entries sequentially. That is why I randomly skips some entries as shuffle. (I'm not have a good background on database). If the number of skips are two much, meaning it takes longer time than the network propagation, the other network layers can wait until the fetch is done.

  1. Does this shuffling manner support with leveldb.

As you can see here (https://en.wikipedia.org/wiki/NoSQL), LevelDB support random access. So, you can simply shuffle keys and use those keys to access any entry. However, because of the random access property, it only allow us to use one process for reading each database.