j-andrews7 / VAMPIRE

Variant and Epigenetic anNotation for Underlying Significance and Regulation
MIT License
3 stars 0 forks source link

Motif clustering as a potential feature. #28

Open j-andrews7 opened 7 years ago

j-andrews7 commented 7 years ago

Many motifs are quite similar, leading to many hits for a given sequence. In addition, many of the motif databases conglomerate data from many sources, leading to numerous motifs for the same TF (see #26).

One way to handle this would be to cluster motifs based on similarity to each other (for which several different methods exist), and use the best (or most common) representation for actual scanning. Unsure how to implement, but these papers/tools address it to a degree:

http://www.benoslab.pitt.edu/stamp/ http://goldenlab.org/projects/gmacs/index.html GMACS Paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384390/