Open rubenweitzman opened 4 months ago
Hi,
AF2_UniRef50
is organized in LMDB format. If you want to load it, you have to first download it and then open the file using lmdb
package.
Here is the example of how you get samples:
import lmdb
lmdb_dir = "/your/path/to/AF2_UniRef50/train"
with lmdb.open(lmdb_dir, readonly=True).begin() as txn:
length = int(txn.get('length'.encode()).decode())
for i in range(length):
data_str = txn.get(i.encode()).decode()
data = json.loads(data_str)
print(data)
break
Hope this could resolve your problem:)
@LTEnjoy Hi, can I download the orginal structure data of the sequence?
@LTEnjoy Hi, can I download the orginal structure data of the sequence?
Hi,
I'm sorry but the original structure data is too large to upload so We are unable to share it. You could download all AF2 structures on the official website https://alphafold.ebi.ac.uk/.
Hi, Thanks for providing the pre-training database with foldseek tokens! having difficulty downloading the dataset and using with hugginface functions. Trying
but getting error
What then is the proper way to load in the dataset from huggingface?