j-andrews7 / VAMPIRE

Variant and Epigenetic anNotation for Underlying Significance and Regulation
MIT License
3 stars 0 forks source link

Many motif databases provide PFMs rather than count matrices #42

Open j-andrews7 opened 7 years ago

j-andrews7 commented 7 years ago

Some of the motif databases (ENCODE, HOMER) provide actual position frequency matrices (PFMs) rather than count matrices. This isn't a big issue in itself, but we will need to build in checks for the modules that expect count matrices that are then converted into position frequency matrices and then into position weight matrices / position specific score matrices (PWMs/PSSMs, they are the same thing).

In particular, summarize.py and motifs.py do this. Options include:

I'm leaning towards the former currently, as it's likely a cleaner solution.