baseline solutions - Githubissues

jvanheld / IBIS_2024

Participation to the IBIS nebchmarking for motif discovery approaches

GNU General Public License v3.0

0 stars 0 forks source link

baseline solutions #11

Open jvanheld opened 2 months ago

jvanheld commented 2 months ago

I am not sure to understand the meaning of this statement in the IBIS technical details.

https://ibis.autosome.org/docs/technical_details

3 Baseline Solutions

During the Leaderboard stage, the baseline for evaluating PWMs and AAAs is built from predictions of oversimplified PWMs assembled from putative consensus sequences recognized by respective TFs, i.e. in each position the PFMs will have 1 for preferred nucleotides and 0 for the others. For evaluating AAAs at the Final stage, a stronger baseline is defined by the best of the submitted PWMs.

brunocontrerasmoreira commented 2 months ago

It seems for PWMs the baseline is a single sequence ie

A 1 0 0 ...
C 0 0 1 ...
T 0 0 0 ...
G 0 1 0 ...

For AAAs, which are supposed to be more advanced than PWMs, the baseline would be the best PWM found. So AAAs can only be evaluated once PWMs have been processed, right?

jvanheld commented 2 months ago

Yes, but what do they do with this baseline? Does it mean that if we submit a PFM with different values they convert it to "baseline" matrix that mimics a strict consensus with a single nucleotide per position ? This would be terribly reductionist relative to the way a PFM should be used for evaluation.

brunocontrerasmoreira commented 2 months ago

In the telegram chat someone asked "what is "baseline concensus" in the leaderboard for AAA models ?" and the reply was:

"Regarding your other question, the "baseline consensus" is a very rough PWM model with only zeroes and ones as weights, where ones reflect the curated 'consensus' sequence bound by the respective TF.

** for AAAs, the "baseline consensus" is the result of the PWM scan using the "baseline consensus" PWM model, although we do not guarantee the scores to be identical due to minor technical differences between the PWM scanning procedures used in the benchmark and in preparing the consensus baseline solution."

So I don't think they use our data after all

jvanheld commented 2 months ago

Thanks for the info (I have no access to telegram). This explains how they build it (and corresponds to our understanding of the doc) but I still don't understand what they use it for. Do you understand?

brunocontrerasmoreira commented 2 months ago

I am guessing the baseline will be used as a control to score the submissions? If a predictor is not better than the baseline then it is useless