kad-ecoli / rna3db

maintain local copy of RNA structure database
0 stars 0 forks source link

marc harary meeting 2020-10-08 minute #2

Closed kad-ecoli closed 3 years ago

kad-ecoli commented 4 years ago
  1. Communication format: questions, issues, and progress from Marc, as well as answers and suggestions from Chengxin will be posted as github issues at https://github.com/kad-ecoli/rna3db/issues. We will also try to meet regularly weekly at 4:00pm Thursday afternoon. Before each meeting, Marc should prepare some data and put it to github issues. This could range from simple input and output files of a third party program Marc test, to more substantial data tables and/or figures as shown at https://github.com/kad-ecoli/rna3db/issues/1. "Empty" communications, where we just talk orally without showing any data, should be avoided.

  2. Plan for Marc before next week (2020-10-15). Evaluate the F1-score and MCC of SPOT-RNA, e2efold, and MXfold2 on the PDB dataset (https://github.com/jaswindersingh2/SPOT-RNA#datasets, roughly 257 RNAs). The input fasta sequence is at TR1_sequences, TS1_sequences TS2_sequences, VL1_sequences. The ground truth label is at TR1_labels, TS1_labels, TS2_labels, VL1_labels.

  3. Plan for Marc before next next week (2020-10-22). Try to construct your own pdb dataset from the original pdb file from the PDB database (https://www.rcsb.org/). Instructions to be followed once the previous section is finished.

kad-ecoli commented 4 years ago

Here is an example for how to calculate the MCC and F1 score for SPOT-RNA predicted SS in CT format. The input is TS1_sequences/4p95-1-A The SPOT-RNA output is 4p95-1-A.ct The ground truth label is TS1_labels/4p95-1-A

./evaluate_2d.py 4p95-1-A.ct TS1_labels/4p95-1-A
The output should be BPpred F1 MCC BPnat
32 0.4228 0.4804 91

This means that SPOT-RNA predicted 32 base pairs; the native RNA structure has 91 base pairs; F1 score and MCC are 0.4228 and 0.4804, respectively. See attachment for script.

4p95-1-A.zip