Motif clustering as a potential feature.

Many motifs are quite similar, leading to many hits for a given sequence. In addition, many of the motif databases conglomerate data from many sources, leading to numerous motifs for the same TF (see #26).

One way to handle this would be to cluster motifs based on similarity to each other (for which several different methods exist), and use the best (or most common) representation for actual scanning. Unsure how to implement, but these papers/tools address it to a degree:

http://www.benoslab.pitt.edu/stamp/ http://goldenlab.org/projects/gmacs/index.html GMACS Paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384390/

j-andrews7 / VAMPIRE

Motif clustering as a potential feature. #28