The labels of result array?

xiaoyeye / CNNC

covolutional neural network based coexpression analysis

MIT License

72 stars 23 forks source link

The labels of result array? #16

Open JFF1594032292 opened 3 years ago

JFF1594032292 commented 3 years ago

Hello, I got the result from predict_no_y.py, and the result .npy file was a Nx3 array (1,2,0 three labels in the trained model). And I have two quetions:

I don't know the label of each columns. Are the labels of columns the same as the previously trained model? Or just the numeric sorting like [ 0,1, 2] or [2, 1, 0]?
Most of the results were approximately equal to [0.33, 0.33, 0.33], which seems like no meaning to the prediction. I want to know if it was normal or what can I do to improve the prediction?

I just want to build a TF-gene network for downstream analysis and know little about machine learning, but your CNNC model really attracted me! Thank you for your consideration！

xiaoyeye commented 3 years ago

1) I used this function "keras.utils.to_categorical". I believe it is like [ 0,1, 2]. Not sure it is updated. The best way should be to feed it using a small sample, like [0 1 2] , to see if the result is [1,0,0 0,1,0 0,0,1].

2) Being close to [0.33, 0.33, 0.33] is Ok. There are many ways to do next. One simple way is to select the max prob, or using each prob to plot ROC (Of course with ground truth available). If you are focusing on interation rather than direction, one possible way is to use the average/min/max prob of label 1 and 2. ..... BTW, I wonder what data you used for the prediction, the data I provide or your own data. If it is the latter, you'd better train it using your own data.

JFF1594032292 commented 3 years ago

Thanks for your reply! I will have a try!

JFF1594032292 commented 3 years ago

Hi, I noticed that train_with_labels_three_foldx_3fold_TF_two_labels.py should be used to predict the TF-target gene relationship, so I use this script to do the training on my own data. The test accuracy is about 0.65 to 0.7 and the ROC performance seems like to reach the Fig.2 in your paper. However, the results from predict_no_y.py are all set to 1: and here are my training command line and prediction command line: And training data and prediction data were all my own data. I read some other issues about this problem but can't solve it.......

xiaoyeye commented 3 years ago

Hi, I am happy to hear that you can reach the performance of Fig. 2. For the predct_np_y.py. It may have some errors, like if num_classes = 2, the activation function shoulde be sigmoid rather than softmax. I am in a travel. now, so could be able to correct the code in a short time.

However, one very simple way is to predict the new data using just the same code for model evaluation on test dataset.