jmschrei / tfmodisco-lite

A lite implementation of tfmodisco, a motif discovery algorithm for genomics experiments.
MIT License
56 stars 16 forks source link

(WIP) Implement MEME output subcommand `meme` #27

Closed bytewife closed 1 year ago

bytewife commented 1 year ago
bytewife commented 1 year ago

@jmschrei It appears that tfmodisco implements subclustering of seqlet clusters. How should we handle these w.r.t. writing motifs to the MEME files?

bytewife commented 1 year ago

Example (modisco_results.CWM.meme):

MEME version 5

ALPHABET= ACGT

Background letter frequencies
A 0.25 C 0.25 G 0.25 T 0.25

MOTIF pattern_0
letter-probability matrix: alength= 4 w= 30 nsites= 1
0.000966 0.001605 0.001199 0.000856
0.000863 0.001477 0.000871 0.001402
0.001114 0.000545 0.001213 0.000839
0.001036 0.001990 0.001227 0.000744
0.002566 0.000326 0.000082 0.000209
-0.000404 0.000548 -0.000335 0.003904
0.000359 0.000382 0.001446 0.000812
0.000310 0.002486 0.000013 0.000915
-0.001650 0.000611 -0.000274 -0.000931
-0.002431 -0.001015 -0.001942 -0.000258
-0.001104 -0.000323 -0.000156 -0.001906
-0.000923 0.000257 0.000545 -0.001862
-0.002585 -0.001050 0.000504 -0.002165
-0.006278 -0.002906 -0.001575 -0.004529
-0.006525 -0.005682 0.000230 -0.003893
0.010032 -0.002446 0.000890 -0.004159
-0.000209 -0.000327 -0.000078 0.061095
-0.000307 0.000133 0.038049 -0.002317
0.046923 -0.000388 -0.000595 -0.000854
-0.006760 -0.001430 0.002318 -0.004597
-0.000112 0.000047 -0.000221 0.058326
-0.000075 0.061763 0.000000 0.000003
0.059826 -0.000072 -0.000006 -0.000103
-0.002650 -0.001453 -0.006538 0.008228
-0.003757 -0.000018 -0.004384 -0.007134
-0.000943 0.003998 -0.000030 -0.000349
-0.003023 0.001721 -0.001096 -0.001723
-0.001865 0.001419 0.000822 -0.001262
-0.002786 -0.001106 -0.000543 0.000054
0.000729 0.000193 0.001019 -0.000507

MOTIF pattern_1
letter-probability matrix: alength= 4 w= 30 nsites= 1
0.001049 0.001619 0.000427 0.000511
-0.000200 0.001497 -0.000001 0.001158
-0.000498 0.000407 0.000163 -0.000873
0.000389 0.000782 0.001914 0.000335
-0.004773 -0.001553 -0.003430 -0.004258
-0.004601 -0.001412 -0.000612 -0.003126
-0.005268 -0.002778 -0.003182 -0.001171
-0.002875 -0.001393 -0.001984 0.028612
0.027435 -0.002664 0.001364 -0.002392
0.065485 -0.000058 -0.000566 -0.000410
-0.003637 -0.002039 -0.003270 0.037624
-0.005678 -0.004789 -0.007113 0.000488
0.006520 -0.002398 -0.000639 -0.002607
0.001827 -0.003912 -0.004117 -0.003664
0.010983 -0.004025 -0.004508 -0.001473
0.007936 -0.003902 -0.003507 -0.001718
-0.001166 0.009161 -0.004876 -0.001414
-0.001216 0.089566 0.000095 0.000006
0.063222 0.000122 0.000036 -0.000758
-0.001174 -0.003339 0.028237 0.000797
0.024157 -0.000106 -0.001504 -0.000790
-0.002020 -0.000069 -0.000733 0.048505
-0.000895 -0.001108 0.059248 -0.003064
-0.001797 -0.002617 0.000276 0.005361
-0.005389 -0.001219 -0.004015 -0.004279
-0.002335 0.000955 -0.000330 -0.001823
-0.002292 0.000080 -0.000658 -0.000764
-0.002291 0.000086 -0.000120 -0.001799
-0.001238 0.002040 0.000819 -0.000516
0.000059 0.000320 -0.000112 -0.000813

MOTIF pattern_2
letter-probability matrix: alength= 4 w= 30 nsites= 1
0.000898 0.002598 0.000942 0.000500
0.001459 0.002100 0.000154 0.000675
0.000583 0.001926 0.001299 0.000154
0.000785 0.000994 0.001907 -0.000990
-0.000691 0.000540 0.000071 -0.001234
0.000047 0.000679 -0.000569 -0.001471
-0.000454 -0.000009 0.000828 -0.000837
-0.001950 0.000279 0.003098 -0.002683
-0.001444 -0.001653 -0.000006 -0.002776
-0.004442 0.002268 -0.002209 -0.005088
0.004961 -0.002721 0.008938 -0.001852
-0.002452 -0.002147 -0.001695 0.011936
0.000027 0.045844 -0.000629 0.000031
0.000057 0.000000 0.000000 0.043926
0.000000 0.000000 0.076969 -0.000147
-0.000493 -0.003102 0.009632 -0.001143
-0.000913 -0.003879 -0.002797 0.001645
-0.000913 -0.002719 -0.003337 0.005109
0.000368 -0.002550 -0.001215 0.007616
0.000958 -0.003784 -0.001084 0.000913
-0.000987 -0.001050 -0.001777 0.005295
0.006877 -0.004218 0.003535 -0.004070
-0.000120 -0.000042 0.000193 0.056402
-0.000554 0.041489 0.000058 0.000563
0.023777 -0.000540 -0.000619 -0.001303
0.000133 -0.001554 0.000041 -0.001371
-0.002286 -0.001094 -0.002247 -0.002995
-0.002717 -0.000872 -0.001313 -0.002747
-0.000462 0.001027 0.000507 0.000987
0.000069 0.000760 0.001237 -0.000024
bytewife commented 1 year ago

@jmschrei Could you also verify that the probability matrices calculated for each datatype is as expected?

jmschrei commented 1 year ago

Let's ignore subpatterns for now. We can add support for that afterwards. Probably it would just be writing the patterns out first and then the subpatterns afterwards.

jmschrei commented 1 year ago

🥳