Closed ritatsousa closed 3 years ago
Hi! This is a great question. We consider GO functions that are fine-grained (leaves of the GO hierarchy) and have enough positive labels to train models on. That's how we have arrived at the 112 kinds of labels.
Thanks for your prompt response!
However, GO: 0003674, a root of the GO hierarchy, isn't it one of the kinds of labels?
I also take the opportunity to ask what was the number of positive labels that you considered as enough.
You are welcome! I double-checked my code, and you are right. Let me correct my previous response. The following is what we did:
We first get a list of GOs that are immediate parents of the leave GOs. Out of those GOs, we extracted GOs that appear more than 200 times in each of train/valid/test splits. This gives us the 112 GOs in our dataset.
Hi, According to the description of ogbn-proteins dataset, the 112 kinds of labels correspond to Gene Ontology (GO) functions. Why, among all the functions defined in the GO, these 112 labels were selected? Thanks in advance!