Open chq1155 opened 1 year ago
Hi, I have the similar question in regarding the script "mic_classifier_training_prodecure.ipynb".
x_train = np.concatenate([mic_x_train, negatives_x_train]) y_train = np.concatenate([mic_y_train, np.zeros(len(negatives_x_train))])
At this line, why we combine the negative dataset (assumed, retrieved from UniProt) with the inactive dataset of MIC. the dataset will be come highly imbalanced. Those(assumed) negative sequences from Uniprot is easier to be predicted as negative.
Thanks for your time. Sincerely, Zhenjiao
Hi, in the script mic_classifier_training_prodecure.ipynb, there are about 3000 mic_x_train, and about 10000 negatives_x_train.
But why in the training output, it says 'Train on 20457 samples, validate on 1312 samples'?
Thank you for your time