jmschrei / tfmodisco-lite

A lite implementation of tfmodisco, a motif discovery algorithm for genomics experiments.
MIT License
56 stars 16 forks source link

Add TOMTOM matrix file-writing, and add report `-t` flag for it #33

Closed bytewife closed 1 year ago

bytewife commented 1 year ago

$ ls -R report/ report/: MA0002.2.png MA0511.2.png MA0852.2.png MA1121.1.png MA1138.1.png MA1638.1.png MA1989.1.png trimmed_logos MA0090.3.png MA0704.1.png MA0884.2.png MA1123.2.png MA1144.1.png MA1643.1.png motifs.html MA0119.1.png MA0808.1.png MA1103.2.png MA1135.1.png MA1528.1.png MA1930.1.png tomtom

report/tomtom: pos_patterns.pattern_0.tomtom.tsv pos_patterns.pattern_3.tomtom.tsv pos_patterns.pattern_6.tomtom.tsv pos_patterns.pattern_1.tomtom.tsv pos_patterns.pattern_4.tomtom.tsv pos_patterns.pattern_2.tomtom.tsv pos_patterns.pattern_5.tomtom.tsv

report/trimmed_logos: pos_patterns.pattern_0.cwm.fwd.png pos_patterns.pattern_2.cwm.rev.png pos_patterns.pattern_5.cwm.fwd.png pos_patterns.pattern_0.cwm.rev.png pos_patterns.pattern_3.cwm.fwd.png pos_patterns.pattern_5.cwm.rev.png pos_patterns.pattern_1.cwm.fwd.png pos_patterns.pattern_3.cwm.rev.png pos_patterns.pattern_6.cwm.fwd.png pos_patterns.pattern_1.cwm.rev.png pos_patterns.pattern_4.cwm.fwd.png pos_patterns.pattern_6.cwm.rev.png pos_patterns.pattern_2.cwm.fwd.png pos_patterns.pattern_4.cwm.rev.png

$ cat report/tomtom/pos_patterns.pattern_1.tomtom.tsv ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 MA0644.2 1 0.00231648 1.94816 0.166599 7 TAATTAAAACCAGATG CTAATTAG - 1 MA1577.1 1 0.00247392 2.08056 0.166599 7 TAATTAAAACCAGATG CTAATTAG - 1 MA0091.1 -8 0.00252822 2.12623 0.166599 8 TAATTAAAACCAGATG AACAGATGGTCG - 1 MA0630.1 1 0.00264122 2.22126 0.166599 7 TAATTAAAACCAGATG TTAATTGG + 1 MA0618.1 1 0.00264122 2.22126 0.166599 7 TAATTAAAACCAGATG CTAATTAA - 1 MA0674.1 1 0.00300473 2.52698 0.166599 7 TAATTAAAACCAGATG GTAATTAA + 1 MA0725.1 1 0.00300473 2.52698 0.166599 7 TAATTAAAACCAGATG ATAATTAG - 1 MA0634.1 2 0.00307404 2.58527 0.166599 8 TAATTAAAACCAGATG TTTAATTAGA- 1 MA0716.1 1 0.00320079 2.69187 0.166599 7 TAATTAAAACCAGATG CTAATTAA + 1 MA0151.1 -2 0.00372656 3.13404 0.166599 6 TAATTAAAACCAGATG ATTAAA + 1 MA0678.1 -8 0.00376362 3.16521 0.166599 8 TAATTAAAACCAGATG ACCATATGGT+ 1 MA0818.2 -8 0.00376362 3.16521 0.166599 8 TAATTAAAACCAGATG ACCATATGGT+ 1 MA0827.1 -8 0.00376362 3.16521 0.166599 8 TAATTAAAACCAGATG ACCATATGTT+ 1 MA0726.1 1 0.00411327 3.45926 0.166599 7 TAATTAAAACCAGATG CTAATTAG - 1 MA0826.1 -8 0.00415835 3.49717 0.166599 8 TAATTAAAACCAGATG AACATATGTT- 1 MA1618.1 -7 0.00431666 3.63031 0.166599 9 TAATTAAAACCAGATG AAACAGATGTTTA + 1 MA0717.1 1 0.00437269 3.67743 0.166599 7 TAATTAAAACCAGATG TTAATTGG - 1 MA0720.1 1 0.00437269 3.67743 0.166599 7 TAATTAAAACCAGATG TTAATTAG - 1 MA0773.1 -2 0.00447953 3.76728 0.166599 12 TAATTAAAACCAGATG ACTATAAATAGA + 1 MA1960.1 0 0.00447953 3.76728 0.166599 12 TAATTAAAACCAGATG TCATTAACACCT - 1 MA0607.2 -8 0.00459049 3.86061 0.166599 8 TAATTAAAACCAGATG ACCATATGGT- 1 MA1468.1 -8 0.00482146 4.05485 0.166599 8 TAATTAAAACCAGATG AACATATGTC+ 1 MA0654.1 1 0.00492296 4.14021 0.166599 7 TAATTAAAACCAGATG TTAATTAG - 1 MA0817.1 -7 0.00509582 4.28559 0.166599 9 TAATTAAAACCAGATG AAACATATGTTT - 1 MA1467.2 -8 0.00515855 4.33834 0.166599 8 TAATTAAAACCAGATG GACAGATGGCA + 1 MA1642.1 -6 0.00548479 4.61271 0.170113 10 TAATTAAAACCAGATG GGAACAGATGGCA + 1 MA0461.2 -8 0.0058531 4.92246 0.175419 8 TAATTAAAACCAGATG AACATATGTT+ 1 MA0892.1 2 0.00614013 5.16385 0.175419 8 TAATTAAAACCAGATG CCTAATTAAA+ 1 MA0675.1 1 0.00689676 5.80017 0.183561 7 TAATTAAAACCAGATG CTAATTAA + 1 MA0092.1 -6 0.00707987 5.95417 0.183561 10 TAATTAAAACCAGATG ATGCCAGACC- 1 MA0132.2 1 0.00767925 6.45825 0.183561 7 TAATTAAAACCAGATG GTAATTAG + 1 MA1525.2 -1 0.0077748 6.53861 0.183561 10 TAATTAAAACCAGATG AATGGAAAAT+ 1 MA0665.1 -8 0.0077748 6.53861 0.183561 8 TAATTAAAACCAGATG AACAGCTGTT+ 1 MA0893.2 1 0.00809498 6.80788 0.183561 7 TAATTAAAACCAGATG GTAATTAG + 1 MA1570.1 -8 0.00814427 6.84933 0.183561 8 TAATTAAAACCAGATG AACATATGTT- 1 MA0601.1 1 0.00836184 7.0323 0.183561 10 TAATTAAAACCAGATG TTAATTAATAT - 1 MA0068.2 1 0.0085283 7.1723 0.183561 7 TAATTAAAACCAGATG CTAATTAG + 1 MA1500.1 1 0.0085283 7.1723 0.183561 7 TAATTAAAACCAGATG GTAATTAA - 1 MA0899.1 2 0.00872873 7.34086 0.183561 9 TAATTAAAACCAGATG GGTAATAAAAA + 1 MA1125.1 -2 0.00875386 7.362 0.183561 12 TAATTAAAACCAGATG TTTAAAAAAAAA + 1 MA0820.1 -8 0.00893115 7.51109 0.183561 8 TAATTAAAACCAGATG AACAGGTGGT- 1 MA0705.1 1 0.0089806 7.55268 0.183561 7 TAATTAAAACCAGATG CTAATTAG + 1 MA0912.2 1 0.0089806 7.55268 0.183561 7 TAATTAAAACCAGATG GTAATTAG - 1 MA0052.4 -1 0.00942557 7.9269 0.18578 15 TAATTAAAACCAGATG TTCTAAAAATAGAAA + 1 MA0723.2 1 0.00945273 7.94975 0.18578 7 TAATTAAAACCAGATG CTAATTAC - 1 MA1507.1 1 0.00994509 8.36382 0.19057 7 TAATTAAAACCAGATG ATCATTAG + 1 MA0668.2 -5 0.0104608 8.79757 0.193574 11 TAATTAAAACCAGATG GGGAACAGATGGTCA + 1 MA1989.1 -4 0.0104807 8.81426 0.193574 12 TAATTAAAACCAGATG AAAAACCACAAGAA + 1 MA0666.2 1 0.0115577 9.72003 0.19675 7 TAATTAAAACCAGATG CCAATTAA + 1 MA1504.1 1 0.0115577 9.72003 0.19675 7 TAATTAAAACCAGATG GTCATTAA + 1 MA1983.1 5 0.011831 9.94988 0.19675 16 TAATTAAAACCAGATG TCTGTTACTTGCAGCCAAAAG +

Tomtom (Motif Comparison Tool): Version 5.5.2 compiled on Apr 19 2023 at 11:45:02

The format of this file is described at https://meme-suite.org/meme/doc/tomtom-output-format.html.

tomtom -no-ssc -oc . --verbosity 1 -text -min-overlap 5 -mi 1 -dist pearson -evalue -thresh 10.0 /tmp/tmpd51k_spg JASPAR2022_CORE_vertebrates_non-redundant_pfms_meme.txt