Open agitter opened 8 years ago
The authors are interested in predicting if an miRNA binds and regulates a gene. They generate 20 features based on complementary sequences, binding affinity/accessibility, and conservation scores for miRNA-mRNA pairs found in TargetScanS and TarBase datasets.
They implement a CNN with two convolutional layers with mean pooling and a kernel size 3. They use constraint relaxation to overcome class imbalance (in this case, there are more experimentally validated positives than negatives). Their method defines four distinct datasets based on different evidence for each pair and confidence in the miRNA-mRNA regulation where one dataset is negative. The CNN then takes as input the different miRNA-mRNA features with a goal of classifying each input into one of the four datasets. They use an experimentally validated test set to validate performance.
It's hard to understand their input data from Section 2.4. As @gwaygenomics said, they try resample the features to get 64, 196, 484, or 900 features. Figure 2 and the text suggest that they treat the 196 features as a 2D input (14x14) but when describing Figure 3d they say the features are a 1D array. This potentially makes the application of CNN to unstructured data much worse than #79. In #79 I ultimately think the CNN makes sense given how they constrained the CNN architecture.
http://doi.org/10.1109/TCBB.2015.2510002