Closed kad-ecoli closed 3 years ago
Here is an example for how to calculate the MCC and F1 score for SPOT-RNA predicted SS in CT format. The input is TS1_sequences/4p95-1-A The SPOT-RNA output is 4p95-1-A.ct The ground truth label is TS1_labels/4p95-1-A
./evaluate_2d.py 4p95-1-A.ct TS1_labels/4p95-1-A
The output should be | BPpred | F1 | MCC | BPnat |
---|---|---|---|---|
32 | 0.4228 | 0.4804 | 91 |
This means that SPOT-RNA predicted 32 base pairs; the native RNA structure has 91 base pairs; F1 score and MCC are 0.4228 and 0.4804, respectively. See attachment for script.
Communication format: questions, issues, and progress from Marc, as well as answers and suggestions from Chengxin will be posted as github issues at https://github.com/kad-ecoli/rna3db/issues. We will also try to meet regularly weekly at 4:00pm Thursday afternoon. Before each meeting, Marc should prepare some data and put it to github issues. This could range from simple input and output files of a third party program Marc test, to more substantial data tables and/or figures as shown at https://github.com/kad-ecoli/rna3db/issues/1. "Empty" communications, where we just talk orally without showing any data, should be avoided.
Plan for Marc before next week (2020-10-15). Evaluate the F1-score and MCC of SPOT-RNA, e2efold, and MXfold2 on the PDB dataset (https://github.com/jaswindersingh2/SPOT-RNA#datasets, roughly 257 RNAs). The input fasta sequence is at TR1_sequences, TS1_sequences TS2_sequences, VL1_sequences. The ground truth label is at TR1_labels, TS1_labels, TS2_labels, VL1_labels.
Plan for Marc before next next week (2020-10-22). Try to construct your own pdb dataset from the original pdb file from the PDB database (https://www.rcsb.org/). Instructions to be followed once the previous section is finished.