zjunlp / OntoProtein

[ICLR 2022] OntoProtein: Protein Pretraining With Gene Ontology Embedding
MIT License
138 stars 22 forks source link

Go-Go relations are the same as Protein-Go relations? #27

Closed xinghao302001 closed 1 year ago

xinghao302001 commented 1 year ago

Hello, researchers.

I have a question about your pre-train datasets, when I checked your lists of pre-train datasets, I found only one dataset named relation2id.txt that is related to relations. And I found you use the same file for GO-GO relations and Protein-GO relations in your codes. Therefore, I am confused about this. And I want to ask if the relation file for Protein-GO and GO-GO are the same, if yes, I want to know why they are the same or how you define or think the relations for Protein-Go and GO-GO should be the same.

Thanks. Best regards, Xinghao

Alexzhuan commented 1 year ago

From the knowledge graph perspective, Protein-GO triplet is in the same knowledge graph as the GO-GO triplet. Hence, we unified the relations that occurred in the Protein-GO triplet and relations in the GO-GO triplet into the same relation id mapping list.

gwcde commented 1 year ago

generate create_goa_triplet('data/original_data/goa_uniprot_all.gaf', 'data/onto_protein_data/protein_go_triplet', 'data/onto_protein_data/protein_seq_map.txt') an error occurred: File "gen_onto_protein_data.py", line 53, in create_goa_triplet if rec[0] != 'UniProtKB' or rec[11] != 'protein': IndexError: list index out of range

Alexzhuan commented 1 year ago

generate create_goa_triplet('data/original_data/goa_uniprot_all.gaf', 'data/onto_protein_data/protein_go_triplet', 'data/onto_protein_data/protein_seq_map.txt') an error occurred: File "gen_onto_protein_data.py", line 53, in create_goa_triplet if rec[0] != 'UniProtKB' or rec[11] != 'protein': IndexError: list index out of range

Hi, the Gene Ontology annotation format has been updated. It's our error that we don't timely update the annotation file format in the README. You need to re-download goa_uniprot_all.gaf instead of *.gpa.

gwcde commented 1 year ago

Thank you for your reply I also meet an error, when I want to create_go_data ,the result of godag = GODag(fin_path, optional_attrs={'relationship'}),
for go_id, go_term in godag.items(): File "gen_onto_protein_data.py", line 115, in create_go_data cur_node_desc = f'{cur_node_name}: {go_term.definition}' AttributeError: 'GOTerm' object has no attribute 'definition' the fin_path is go.obo (http://current.geneontology.org/ontology/go.obo)

Alexzhuan commented 1 year ago

You could refer to https://github.com/zjunlp/OntoProtein#environment-for-pre-training-data-generation.