Closed lhallee closed 8 months ago
Hi Logan,
You can try the code below (we use Thermostability as example):
import lmdb
import json
lmdb_dir = "/your/path/to/LMDB/Thermostability/normal/train"
env = lmdb.open(lmdb_dir, readonly=True)
operator = env.begin()
length = int(operator.get("length".encode("utf-8")).decode("utf-8"))
for i in range(length):
key = f"{i}".encode("utf-8")
value = operator.get(key)
data_dict = json.loads(value.decode("utf-8"))
print(data_dict.keys())
break
I hope this could solve the problem. Best, Jin
This works great. Thanks for the help!
Hello,
I see you have made .mdb dataset files available. How would one go about simply extracting and using the fine-tune data for downstream tasks? I would like to fine-tune my own model so the training script will not work. Best, Logan