loosolab / Datenanalyse-2021

2 stars 0 forks source link

Interface definition: WP6<-WP5 #6

Closed oKoch closed 2 years ago

oKoch commented 2 years ago

Interface definition between WP6 and WP5:

Requirements of WP6 for WP5:

  1. A path to the files you are preparing for us
  2. The new motifs you find have to be translated to a probability matrix of the motif.
  3. Save a single new motif in a single file and give the new motif+file a id/name (e.g. NEW001, NEW002)
  4. File type/content should be like hocomoco or jaspar. For this have a look at the following descriptions:

The two different file types from hocomoco or jaspar are including known MOTIFS, with probability matrices.

Jaspar file structure (.meme, https://jaspar.genereg.net/downloads/ ) :

MEME version 4

ALPHABET= ACGT

strands: + -

Background letter frequencies
A 0.25 C 0.25 G 0.25 T 0.25

MOTIF MA0006.1 Ahr::Arnt
letter-probability matrix: alength= 4 w= 6 nsites= 24 E= 0
 0.125000  0.333333  0.083333  0.458333
 0.000000  0.000000  0.958333  0.041667
 0.000000  0.958333  0.000000  0.041667
 0.000000  0.000000  0.958333  0.041667
 0.000000  0.000000  0.000000  1.000000
 0.000000  0.000000  1.000000  0.000000
URL http://jaspar.genereg.net/matrix/MA0006.1

MOTIF MA0854.1 Alx1
letter-probability matrix: alength= 4 w= 17 nsites= 100 E= 0
 0.190000  0.440000  0.170000  0.200000
 0.070000  0.210000  0.650000  0.070000
 0.380000  0.330000  0.140000  0.150000
 0.430000  0.090000  0.310000  0.170000
 0.050000  0.400000  0.020000  0.530000
 0.010000  0.070000  0.000000  0.920000
 0.989899  0.000000  0.010101  0.000000
 0.980000  0.000000  0.010000  0.010000
 0.010000  0.010000  0.000000  0.980000
 0.000000  0.010101  0.000000  0.989899
 0.920000  0.000000  0.070000  0.010000
 0.530000  0.020000  0.400000  0.050000
 0.150000  0.210000  0.110000  0.530000
 0.230000  0.320000  0.160000  0.290000
 0.393939  0.313131  0.121212  0.171717
 0.240000  0.290000  0.230000  0.240000
 0.150000  0.400000  0.140000  0.310000
URL http://jaspar.genereg.net/matrix/MA0854.1

OR Hocomoco file type (.txt, https://hocomoco11.autosome.ru/:

>AHR_HUMAN.H11MO.0.B    AHR
41  11  22  3   1   3   0   0   43
18  12  44  1   150 1   3   0   67
56  35  21  146 1   149 1   154 16
39  96  67  4   2   1   150 0   28
>AIRE_HUMAN.H11MO.0.C   AIRE
16  8   6   2   0   13  16  15  14  21  16  9   0   0   9   3   18  17
11  8   6   0   0   2   4   6   3   4   2   6   1   0   4   8   1   11
5   6   8   36  33  1   1   6   6   6   3   3   36  40  10  8   7   4
9   19  21  3   8   25  20  14  18  10  20  23  4   1   18  22  15  9
oKoch commented 2 years ago

@gnnpl Do we need a probability Matrix or is another type possible? WP5 has counts not probabilities? Add example file for testing. (merge new motif to known motifs)

@oKoch motif file joining is working with tobias (subtool: FormatMotifs). New motifs with gene cooccurences ?

gnnpl commented 2 years ago

Here xou can find two example motifs. They are provided in meme format. The matrix in those files is a position frequency matrix wich can be translated to a position probability matrix.

For documentatation purposes, you can also find the file contents below:

MEME version 4

ALPHABET= ACGT

strands: + -

Background letter frequencies A 0.25 C 0.25 G 0.25 T 0.25

MOTIF motif_10 motif_10 letter-probability matrix: alength= 4 w= 10 nsites= 1396 E= 0 0.000000 0.939112 0.000000 0.060888 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.210602 0.000000 0.787249 0.002149 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.842407 0.157593 0.000000 0.000000 0.037966 0.000000 0.962034 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.995702 0.004298

MEME version 4

ALPHABET= ACGT

strands: + -

Background letter frequencies A 0.25 C 0.25 G 0.25 T 0.25

MOTIF motif_29 motif_29 letter-probability matrix: alength= 4 w= 10 nsites= 197 E= 0 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.974619 0.000000 0.025381 0.000000 0.000000 0.000000 1.000000 0.035533 0.964467 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.050761 0.000000 0.000000 0.949239 0.959391 0.000000 0.000000 0.040609 0.000000 0.000000 0.974619 0.025381

oKoch commented 2 years ago

Thank you for the example files, next i am going to look at it.

oKoch commented 2 years ago

Hi i made an example implementation for the analysis. It is working fine. You can create the files like you did in the examples.