xiaoyeye / CNNC

covolutional neural network based coexpression analysis
MIT License
73 stars 23 forks source link

Confusion about how the KEGG edges were filtered #18

Open suzannejin opened 3 years ago

suzannejin commented 3 years ago

Hi!

In the paper, you mentioned:

KEGG contains 290 pathways, and Reactome contains 1,581 pathways. For both, we only select directed edges with either activation or inhibition edge types and filter out cyclic gene pairs where genes regulate each other mutually (to allow for a unique label for each pair). In total, we have 3,057 proteins with outgoing directed edges in KEGG, and the total number of directed edges is 33,127. For Reactome, the corresponding numbers are 2,519 and 33,641.

What I am not clear is if you removed all the cyclic gene pairs (A->B, and B->A), or just kept one direction. So for example, if you have A->B and B->A, you kept A->B but not B->A, or you removed both of them directly (?)

Also, if instead of predicting regulatory pairs, I only want to predict the pairs in the same pathway no matter their causality, then can I keep other KEGG edge types that are not activation or inhibition?

Thanks in advance!!

xiaoyeye commented 3 years ago

Hi, You are welcome. The training and test data generation is very flexible, which depends on what you want the model to learn. 1) In the original paper, we removed all cyclic gene pairs. 2) I think you can try it, and just keep in mind that the training and test should follow the same standard.