rajanil / msCentipede

A hierarchical multiscale model for inferring transcription factor binding from chromatin accessibility data.
MIT License
25 stars 6 forks source link

motif instances #2

Closed igordot closed 9 years ago

igordot commented 9 years ago

This is not really an issue (maybe a documentation issue). The tool takes two inputs. One is a BAM file, which makes sense. The other is a file of motif instances. Do you have any suggestions on where to get or how to generate that file?

rajanil commented 9 years ago

Given a position weight matrix for a transcription factor, the list of motif instances is typically generated by scanning the genome for sequences that have a higher likelihood under the PWM model than a background model. Since the format for PWM matrices tend to vary and users might want to only include a subset of all motif instances, we decided to let this be an input generated by the user separately. Thanks for the suggestion; I will update the documentation with this description.

igordot commented 9 years ago

I know there is at least one database (http://compbio.mit.edu/encode-motifs/), but it's human only. Any additional resources or tools would be very helpful.

igordot commented 9 years ago

Sorry. Do you have any updates on this topic?

rajanil commented 9 years ago

The encode database is quite useful. Another database of PWM models for human and mouse factors is from this paper. http://www.sciencedirect.com/science/article/pii/S0092867412014961

Two other commonly used databases of PWM models are TRANSFAC (http://www.gene-regulation.com/pub/databases.html) and JASPAR (http://jaspardev.genereg.net/), which have models for factors in other eukaryotes.

Hope this helps!