ksOAn6g5 / TaiSu

TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)
Other
172 stars 11 forks source link

Can you teach me how to save a dataset as an LMDB database? #4

Open lyccol opened 1 year ago

lyccol commented 1 year ago

Can you teach me how to save a dataset as an LMDB database?

I saw that you used two LMDB databases, one for images and one for text, to load data.

YulongBonjour commented 1 year ago

You may refer to this: https://github.com/YulongBonjour/SimVLM/tree/main/data_utils