iRNA-COSI / APAeval

Community effort to evaluate computational methods for the detection and quantification of poly(A) sites and estimating their differential usage across RNA-seq samples
MIT License
13 stars 14 forks source link

OpenEBench summary workflow: Remove duplicates in assessment ids #154

Closed AsierGonzalez closed 3 years ago

AsierGonzalez commented 3 years ago

The summary workflow creates three assessments (n_matched_sites, n_unmatched_sites, correlation) but the first two have the same _id, which causes issues when adding the results to the database. The value of _id for those assessments is defined in lines 70 and 74 of compute_metrics.py:

[Line 70] data_id_1 = community + ":" + challenge + "_runtime_" + participant + "_A"
[Line 74] data_id_2 = community + ":" + challenge + "_runtime_" + participant + "_A"

If you replace "runtime" with something related to the name of the metrics (e.g. "matched" and "unmatched"), this issue would be solved. Similarly, the id of the correlation assessment could also include "correlation" instead of "memory" (see line 78):

data_id_3 = community + ":" + challenge + "_memory_" + participant + "_A"

Also, the trailing "_A"s are unnecessary and they can be dropped.

AsierGonzalez commented 3 years ago

I forgot to mention that we also need the community variable to be removed from the ids because of issues downstream. If you could replace it with a hardcoded 'APAeval' that would solve the problem. This is the same approach I have suggested in issue 153.

yuukiiwa commented 3 years ago

Thanks, Asier! I have updated the code with the changes you suggested: https://github.com/yuukiiwa/APAeval-summary-workflow/commit/faf1e20860183f330a5695db5ce0654542902fad

AsierGonzalez commented 3 years ago

Fantastic, thank you @yuukiiwa!

AsierGonzalez commented 3 years ago

The changes had the desired effect - I'm closing the ticket. Thank you for the great work @yuukiiwa!