Closed Mottotime closed 8 years ago
I ran some benchmarks and there is one thing you can do to speed up the example. Write multiple entries as part of the same transaction. That is, move with env.begin(write=True) as txn:
outside of the for-loop:
with env.begin(write=True) as txn:
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = X.shape[1]
datum.height = X.shape[2]
datum.width = X.shape[3]
datum.data = X[i].tobytes() # or .tostring() if numpy < 1.9
datum.label = int(y[i])
str_id = '{:08}'.format(i)
# The encode is only essential in Python 3
txn.put(str_id.encode('ascii'), datum.SerializeToString())
Saving 10,000 images of size (3×224×224 bytes) took 6 seconds, as opposed to 110 seconds using the original version. Not sure if it's a good idea to make transactions huge, so you might want to try buffering them.
Thanks for pointing this out, I will update the blog post with this information.
Thank you very much.
Hi, I've followed the instructions in Creating an LMDB database in Python, which is a very helpful post. However I found it would take more than 10 minutes to write less than 10,000 images into an lmdb file.
The map_size was set as 1TB.
Is there any way to accelerate the processing?