Open rubenweitzman opened 3 months ago
Hi,
AF2_UniRef50
is organized in LMDB format. If you want to load it, you have to first download it and then open the file using lmdb
package.
Here is the example of how you get samples:
import lmdb
lmdb_dir = "/your/path/to/AF2_UniRef50/train"
with lmdb.open(lmdb_dir, readonly=True).begin() as txn:
length = int(txn.get('length'.encode()).decode())
for i in range(length):
data_str = txn.get(i.encode()).decode()
data = json.loads(data_str)
print(data)
break
Hope this could resolve your problem:)
Hi, Thanks for providing the pre-training database with foldseek tokens! having difficulty downloading the dataset and using with hugginface functions. Trying
but getting error
What then is the proper way to load in the dataset from huggingface?