hillerlab / TFforge

Identifying transcription factors involved in phenotypic differences between species
MIT License
1 stars 3 forks source link

how to get motif file with wtmx format #3

Open aaannaw opened 1 month ago

aaannaw commented 1 month ago

Dear author, I have downloaded motif files from JASPAR database and the format is like:

>MA0002.3       Runx1
A  [   123     57      0     87      0     17     10    131    500 ]
C  [  1072      0     75    127      0     42    400    463    158 ]
G  [   149      7   1872     70   1987   1848    251     81    289 ]
T  [   656   1936     53   1716     13     93   1339   1325   1053 ]

I can understand the motif represented a 9bp motif and the motif name is Runx1. So the head of the wtmx format of the motif is : Runx1_primary 9 pseudo_count. However I can not understand how to get "pseudo_count". I noticed you mentioned the link "[https://fil.email/ZZpEQaE0]" to provide a script for dealing with the motif files. How the link is invalid.

Could you give me any suggestions? Looking forward with your reply. Best wishes Na Wan

MichaelHiller commented 1 month ago

Hi, I think you can use the Jaspar matrices as they are. Where did you get the link to the script and where is this documented? I couldn't find anything about this on our github or the paper.

@bjorn.langer@crg.eu do you know?

aaannaw commented 1 month ago

Hello, I transferred Jaspar matrices to wtmx matrices by writing a script. The link is from [https://github.com/hillerlab/REforge/issues/4] image. However, I find the output is abnormal. Why some CNEs sequences can not got the stubb score? image

MichaelHiller commented 1 month ago

I'll ask Bjoern to have a look.

bjlang commented 1 month ago

The matrix input format is described here: https://github.com/hillerlab/TFforge?tab=readme-ov-file#input-data In short, your JASPAR matrix would need to be transposed, and the counts converted into frequencies. A pseudocount is not necessary, i.e your header is fine. Unfortunately, we don't provide a script as there are too many motif formats out there, but the MEME-Suite (https://meme-suite.org/meme/doc/motif_conversion.html) might be of help. The Motif letter-probability matrix is very similar to Stubb's WTMX format.

aaannaw commented 1 month ago

Hi, Thank for your response. However, I found for some CNE sequences and some branches, the stubb score can not be calculated. Why? Could you give me any explanation for the output? image