Open jvanheld opened 2 months ago
It seems for PWMs the baseline is a single sequence ie
A 1 0 0 ...
C 0 0 1 ...
T 0 0 0 ...
G 0 1 0 ...
For AAAs, which are supposed to be more advanced than PWMs, the baseline would be the best PWM found. So AAAs can only be evaluated once PWMs have been processed, right?
Yes, but what do they do with this baseline? Does it mean that if we submit a PFM with different values they convert it to "baseline" matrix that mimics a strict consensus with a single nucleotide per position ? This would be terribly reductionist relative to the way a PFM should be used for evaluation.
In the telegram chat someone asked "what is "baseline concensus" in the leaderboard for AAA models ?" and the reply was:
"Regarding your other question, the "baseline consensus" is a very rough PWM model with only zeroes and ones as weights, where ones reflect the curated 'consensus' sequence bound by the respective TF.
** for AAAs, the "baseline consensus" is the result of the PWM scan using the "baseline consensus" PWM model, although we do not guarantee the scores to be identical due to minor technical differences between the PWM scanning procedures used in the benchmark and in preparing the consensus baseline solution."
So I don't think they use our data after all
Thanks for the info (I have no access to telegram). This explains how they build it (and corresponds to our understanding of the doc) but I still don't understand what they use it for. Do you understand?
I am guessing the baseline will be used as a control to score the submissions? If a predictor is not better than the baseline then it is useless
I am not sure to understand the meaning of this statement in the IBIS technical details.
https://ibis.autosome.org/docs/technical_details