jvanheld / IBIS_2024

Participation to the IBIS nebchmarking for motif discovery approaches
GNU General Public License v3.0
0 stars 0 forks source link

Datasets with no good motifs #1

Open jvanheld opened 4 months ago

jvanheld commented 4 months ago

For some datasets, oligo-analysis returns highly significant k-mers, but

For this type of peak sets, we have to evaluate whether it is better to send motifs (PSSM) or to use radically different approaches. I think that for this data type, a possibility would be to apply supervised classification based on a table of k-mer counts in each peak. If we have time (which is far from sure) we could test this.

brunocontrerasmoreira commented 4 months ago

Hi jacques, are you using background frequencies of masked ref genomes?

jvanheld commented 4 months ago

Hi Bruno,

I am estimating the k-mer prior probabilities based on a Markov model of order k-2 (or lower if the peak set size is too small).

It would be interesting to evaluate the impact of the BG model by comparing the motifs discovered with different alternatives

Masked ref genome might be interesting but will be a mixture of different sequence types, most of which might have different compositions than the peaks. In my experience, using the peaks themselves gives better results for oligo-analysis but we could test the alternatives;

However, this will only modify the results of oligo-analysis, and the fact that position-analysis returns only weakly significant motifs suggests that there is a more fundamental problem with these peaks, and that PWMs may be suboptimal to classify peaks as regulated or not by the same TF.