Closed giuliacassara closed 4 months ago
The .tsv
files you mentioned above are part of the multiscale interactome data, which you can download from this Github repo. This provides the directory that you specify when running preprocess_msi.py
. For the get_descriptions_msi.py
script, msi_file
is the output of running preprocess_msi.py
, go_file
is the Gene Ontology OBO file which you can download from this link, and entrez_file
is a JSON file with a dictionary that maps Entrez protein IDs to their descriptions for all the proteins in msi. You can scrape these descriptions from Entrez using BioPython, but I've attached the file used to construct the MSI dataset here.
Hi Rahul, many thanks for your quick responses! I would like to recreate the processed files myself, by using the scripts in data/script. Also, I want to create for msi and hetionet a triplet file with explicit reference to the relationship (I know that you are not supporting this feature, it's from my initiative) . When I launch get_description_msi.py the script expects in the arguments _msi_file, go_file, entrezfile, which I don't have. The same for preprocess_msi.py, which expect a directory which is the location of msi files. Looking in depth at your code
files = {('drug', 'protein') : '1_drug_to_protein.tsv', ('disease', 'protein') : '2_indication_to_protein.tsv', ('protein', 'protein') : '3_protein_to_protein.tsv', ('protein', 'function') : '4_protein_to_biological_function.tsv', ('function', 'function') : '5_biological_function_to_biological_function.tsv', ('drug', 'disease') : '6_drug_indication_df.tsv'}
I saw that these files are what I really need to build my triplets files, although I don't know how you created them. Can you please send me these files or tell me how I can reproduce them?