microsoft / XPretrain

Multi-modality pre-training
Other
471 stars 37 forks source link

Ways to open the .mdb caption files #19

Closed ffnc1020 closed 1 year ago

ffnc1020 commented 1 year ago

Is there an easy way to open the .mdb files? Trying to dump it to csv with mdbtools but it complains the file is not an access database.

HellwayXue commented 1 year ago

Sorry, I'm unfamiliar with the mdbtools. We use the python package lmdb to open and load data. So at least you can also use lmdb to enumerate the entries and dump to a CSV file. This might take a while, but should work.

ffnc1020 commented 1 year ago

Thanks for the tip! I was able to read the captions with lmdb.