Closed JiuhaiChen closed 2 years ago
Thanks for your reply. Is the smiles_string stored in mol.csv.gz under ./mapping folder ? If i want to generate scaffold index from smile_string, is there anything else i need to do ? What i have tried is to open mol.csv.gz file and call scaffold_split function, but it seems the format is not right. i just add these code based on your code send me before:
if name == 'main': with gzip.open("mol.csv.gz", "rb") as f: data = f.read() train_idx, valid_index, test_idx = scaffold_split(list(data))
The error message:
mol = Chem.MolFromSmiles(smiles)
TypeError: No registered converter was able to produce a C++ rvalue of type std::basic_string<wchar_t, std::char_traits
Thanks!
Yes, that's correct. You can read mol.csv.gz
by
import pandas as pd
df = pd.read_csv('mol.csv.gz')
smiles_list = df['smiles'].tolist()
More details can be found in mapping/README.md
. Hope this helps!
Thanks! For ogbg_ppa dataset, i was wondering if there is species index, just like ogbg_proteins and ogbl_ppa, species index is included in the dataset ?
Yes, species index for ogbg-ppa should be in the corresponding mapping/
directory. See mapping/README.md
for details.
mapping/
will be most likely located in dataset/ogbg_ppa/mapping
Hi, OGB Team, for ogbg_proteins and ogbl_ppa, i was wondering how you encode the species index into node features? Just append the species index into each node feature? For each graph, since it only belongs to one species domain, do you encode one species index into all node features within one graph? And for ogbg-ppa, ogbg-molhiv, ogbg-molpcba, do you encode the species index into node or edge feature? Thanks !
Hi, i am working on ogbg_molpcba dataset, and noticed that scaffold index can be downloaded from https://snap.stanford.edu/ogb/data/misc/ogbg_molpcba/. I was wondering how you obtain these scaffold index ? From the source dataset or generate by some software ? If the later case, can i know how to generate it ? Thanks!