Open chrisclarkson opened 5 years ago
Hi Chris,
do you have zeros in your normalized PWM? We recommend adding small pseudo counts to your frequency matrix..
Best, Matthias
Am 10.01.2019 um 11:36 schrieb Chris Clarkson notifications@github.com:
Hi, I am interested in using your package to calculating the affinity of predicted binding sites and subsequently the significance of the affinity values.
My pipeline (using your instructions from https://rdrr.io/github/matthuska/tRap/man/ https://rdrr.io/github/matthuska/tRap/man/) is as follows:
pfm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A 2 3 5 1 0 13 1 3 18 0 6 1 0 1 1 6 1 2 6 2 C 8 3 2 17 19 0 12 10 1 0 0 1 0 1 18 0 12 7 6 5 G 2 10 12 1 0 5 7 1 1 20 14 14 20 18 0 14 5 3 7 7 T 8 4 2 1 0 2 0 6 0 0 0 5 0 1 1 0 1 8 1 6
pwm <- toPWM(pfm) pwm=PWMatrix(ID="Unknown", name=tf, matrixClass="Unknown", strand="+", bg=c(A=0.25, C=0.25, G=0.25, T=0.25), tags=list(), profileMatrix=pwm, pseudocounts=numeric()) peaks = searchSeq(pwm, seq, min.score = "80%",mc.cores=10L) peaks_bed = as(peaks, "GRanges")
head(as.data.frame(peaks_bed$siteSeqs)$x) [1] "AGCCCACTAGGGTGCAGTCC" "ATACCAGAAGAAGGCATCAG" "ACACCAGAAGAGGGCGTCAG" [4] "ATGCCACGAGGTGGAGATAA" "GACTCACTAGAGGGCACAGG" "TCTACAGCAGGTGGCAACAC"
af=affinity(normalize.pwm(pwm@profileMatrix), as.data.frame(peaks_bed$siteSeqs)$x) However this results in a many NaN values....
sum(af=='NaN')/length(af) [1] 0.4785195 I was advised that tRap is used on long sequences rather than short ones so I extended the sequences:
start(peaks_bed) <- start(peaks_bed) - 30 end(peaks_bed) <- end(peaks_bed) + 30 extended_seqs <- getSeq(Mmusculus, peaks_bed) af_ext=affinity(normalize.pwm(pwm@profileMatrix),as.data.frame(extended_seqs)$x)
However this results in 100% NaNs....
I'm wondering if I am doing something wrong?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/matthuska/tRap/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AIHrHuHLhvR5OL5vxXhUlIcxqhTEVLXQks5vBxfAgaJpZM4Z5OGv.
Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671
Hi Matthias, Thank you for your quick reply: There are not many zeros in the normalised matrix:
normalize.pwm(pwm@profileMatrix)
1 2 3 4 5 6
A 1.0626965 0.9503723 0.04448988 0.4578354 0.3879632 -0.2923233
C -0.5626965 0.9503723 0.85949647 -0.3735062 -0.1638895 1.0223918
G 1.0626965 -1.3188119 -0.76348283 0.4578354 0.3879632 0.0000000
T -0.5626965 0.4180672 0.85949647 0.4578354 0.3879632 0.2699315
7 8 9 10 11 12
A 0.41349136 0.4404797 -0.2536981 0.3870730 -0.03296475 0.71519479
C -0.24047408 -0.6112445 0.2969491 0.3870730 0.61061995 0.71519479
G -0.09176563 1.3303426 0.2969491 -0.1612191 -0.18827516 -0.45258182
T 0.91874835 -0.1595778 0.6597998 0.3870730 0.61061995 0.02219224
13 14 15 16 17 18
A 0.3870730 0.4538871 0.2969491 -0.03296475 0.75263253 1.5229892
C 0.3870730 0.4538871 -0.2536981 0.61061995 -0.47909620 -0.5761614
G -0.1612191 -0.3616612 0.6597998 -0.18827516 -0.02616885 0.8595932
T 0.3870730 0.4538871 0.2969491 0.61061995 0.75263253 -0.8064209
19 20
A -0.2228909 2.3968502
C -0.2228909 0.0000000
G -0.4123795 -0.9067515
T 1.8581614 -0.4900988
when I try the following, I still get NaNs:
affinity(normalize.pwm(pwm@profileMatrix)+0.0000000000000001, as.data.frame(tmp)$x)
[1] NaN
Warning messages:
1: In log(maxCG/p[3]) : NaNs produced
2: In log(maxAT/p) : NaNs produced
The PWM you put into the affinity function should be a probability matrix with values between [0-1]
Am 10.01.2019 um 12:50 schrieb Chris Clarkson notifications@github.com:
Hi Matthias, Thank you for your quick reply: There are no zeros in the normalised matrix:
normalize.pwm(pwm@profileMatrix) 1 2 3 4 5 6 A 1.0626965 0.9503723 0.04448988 0.4578354 0.3879632 -0.2923233 C -0.5626965 0.9503723 0.85949647 -0.3735062 -0.1638895 1.0223918 G 1.0626965 -1.3188119 -0.76348283 0.4578354 0.3879632 0.0000000 T -0.5626965 0.4180672 0.85949647 0.4578354 0.3879632 0.2699315 7 8 9 10 11 12 A 0.41349136 0.4404797 -0.2536981 0.3870730 -0.03296475 0.71519479 C -0.24047408 -0.6112445 0.2969491 0.3870730 0.61061995 0.71519479 G -0.09176563 1.3303426 0.2969491 -0.1612191 -0.18827516 -0.45258182 T 0.91874835 -0.1595778 0.6597998 0.3870730 0.61061995 0.02219224 13 14 15 16 17 18 A 0.3870730 0.4538871 0.2969491 -0.03296475 0.75263253 1.5229892 C 0.3870730 0.4538871 -0.2536981 0.61061995 -0.47909620 -0.5761614 G -0.1612191 -0.3616612 0.6597998 -0.18827516 -0.02616885 0.8595932 T 0.3870730 0.4538871 0.2969491 0.61061995 0.75263253 -0.8064209 19 20 A -0.2228909 2.3968502 C -0.2228909 0.0000000 G -0.4123795 -0.9067515 T 1.8581614 -0.4900988 — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/matthuska/tRap/issues/3#issuecomment-453069242, or mute the thread https://github.com/notifications/unsubscribe-auth/AIHrHiWrbvdHBb8gUD9wpxql8GFE3g69ks5vBykKgaJpZM4Z5OGv.
Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671
Hi sorry for delay,
Hmm very strange.... I just got the PFM from the jaspar 2018 database and then TFBSTools::toPWM
command which results in a matrix like the one seen above... Can you recommend a package/ command that could perform the conversion (PFM>PWM) in the way that is necessary for your package to work?
Hi Chris,
I would recommend to use normalize.pwm from the tRap package directly on your PFM (and add a pseudo count of 0.25)
pwm.for.trap <- normalize.pwm(pfm + 0.25) affinity(pwm.for.trap)
Best, Matthias
Am 10.01.2019 um 23:26 schrieb Chris Clarkson notifications@github.com:
Hi sorry for delay, Hmm very strange.... I just got the PFM from the jaspar 2018 database and then TFBSTools::toPWM command which results in a matrix like the one seen above... Can you recommend a package/ command that could perform the conversion (PFM>PWM) in the way that is necessary for your package to work?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/matthuska/tRap/issues/3#issuecomment-453279092, or mute the thread https://github.com/notifications/unsubscribe-auth/AIHrHmF1snaakiE0i4svmEBNDrjRdcqIks5vB74IgaJpZM4Z5OGv.
Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671
Hi again Matthias, Thank you for this fantastic help. It works now Just 2 last questions: 1. I would also like to apply this analysis to more than one transcription factor- hence if I download a list of PFMs from Jaspar:
ARNT
[,1] [,2] [,3] [,4] [,5] [,6]
A 4 19 0 0 0 0
C 16 0 20 0 0 0
G 0 1 0 20 0 20
T 0 0 0 0 20 0
AHR
[,1] [,2] [,3] [,4] [,5] [,6]
A 3 0 0 0 0 0
C 8 0 23 0 0 0
G 2 23 0 23 0 24
T 11 1 1 1 24 0
Ddit3::Cebpa
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
A 14 11 18 0 0 4 38 36 0 14 4 0
C 7 7 3 1 0 33 1 2 6 17 23 26
G 12 14 15 0 38 0 0 1 0 5 9 6
T 6 7 3 38 1 2 0 0 33 3 3 7
...
Can I take it as a suitable strategy to add 0.25 to all of these PFMs and then implement your analysis on them?
local.paffinity
, I tried it on the calculated affinity values:seqs=as.data.frame(extended_seqs)$x
head(seqs)
A DNAStringSet instance of length 6
width seq
[1] 79 TACGTAAGTACACTGTAGCTGTCTTCAGACACAC...TCAGATCTCATTATGGGTAGTTGTGAGCTACCA
[2] 79 TTTTACTTTCTCTCTCCCTCTTATTGCTAGATGC...ATAAACAGCTTGCTTCTGCCATGTTCTGCAGAA
[3] 79 GACATCTGAGTACCTTCCCTGTAAGAGAGCTTGC...CTGAGCACTGAAACTCAGAGGAGAGAATCTGTC
head(af_ext)
[1] 10.586463 12.458601 10.153033 7.571788 9.838501 10.966423
af_ext=affinity(normalize.pwm(pwm.for.trap@profileMatrix),seqs)
for(i in c(1:length(af_ext))){
print(local.paffinity(af_ext[i],pwm.for.trap,seqs[i]))
}
[1] 0.01612903
[1] 0.01612903
[1] 0.01612903
[1] 0.01612903
[1] 0.01612903
........
The p-values is 0.01612903
in every case....
Can I take these values as correct or am I not applying this function correctly?
Hi Chris,
re1: I looked at the code again - should have done this first! Actually you can also pass the unnormalized PWM (counts >= 0) to the affinity function. The pseudo count is an argument of the affinity function. So no need to call normalize.pwm and add pseudo counts..
re2: the local.paffinity function takes arguments:
Best, Matthias
Am 11.01.2019 um 11:55 schrieb Chris Clarkson notifications@github.com:
Hi again Matthias, Thank you for this fantastic help. It works now Just 2 last questions: 1. I would also like to apply this analysis to more than one transcription factor- hence if I download a list of PFMs from Jaspar:
ARNT [,1] [,2] [,3] [,4] [,5] [,6] A 4 19 0 0 0 0 C 16 0 20 0 0 0 G 0 1 0 20 0 20 T 0 0 0 0 20 0
AHR [,1] [,2] [,3] [,4] [,5] [,6] A 3 0 0 0 0 0 C 8 0 23 0 0 0 G 2 23 0 23 0 24 T 11 1 1 1 24 0
Ddit3::Cebpa [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] A 14 11 18 0 0 4 38 36 0 14 4 0 C 7 7 3 1 0 33 1 2 6 17 23 26 G 12 14 15 0 38 0 0 1 0 5 9 6 T 6 7 3 38 1 2 0 0 33 3 3 7 ... Can I take it as a suitable strategy to add 0.25 to all of these PFMs and then implement your analysis on them?
As for the command local.paffinity, I tried it on the calculated affinity values: seqs=as.data.frame(extended_seqs)$x head(extended_seqs) A DNAStringSet instance of length 6 width seq [1] 79 TACGTAAGTACACTGTAGCTGTCTTCAGACACAC...TCAGATCTCATTATGGGTAGTTGTGAGCTACCA [2] 79 TTTTACTTTCTCTCTCCCTCTTATTGCTAGATGC...ATAAACAGCTTGCTTCTGCCATGTTCTGCAGAA [3] 79 GACATCTGAGTACCTTCCCTGTAAGAGAGCTTGC...CTGAGCACTGAAACTCAGAGGAGAGAATCTGTC
head(af_ext) [1] 10.586463 12.458601 10.153033 7.571788 9.838501 10.966423
af_ext=affinity(normalize.pwm(pwm.fro.trap@profileMatrix),seqs)
for(i in c(1:length(af_ext))){ print(local.paffinity(af_ext[i],pwm.for.trap,seqs[i])) }
[1] 0.01612903 [1] 0.01612903 [1] 0.01612903 [1] 0.01612903 [1] 0.01612903 ........ The p-values is 0.01612903 in every case.... Can I take these values as correct or am I not applying this function correctly?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/matthuska/tRap/issues/3#issuecomment-453480519, or mute the thread https://github.com/notifications/unsubscribe-auth/AIHrHnucBozWW_wPyYaiob9H5rQ7bjqyks5vCG26gaJpZM4Z5OGv.
Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671
Do you mean that I should use the unnormalised PFMs? Not the PWMs as they have values < 0 (as shown above)...
Hi, I am interested in using your package to calculating the affinity of predicted binding sites and subsequently the significance of the affinity values.
My pipeline (using your instructions from https://rdrr.io/github/matthuska/tRap/man/) is as follows:
However this results in a many NaN values....
I was advised that tRap is used on long sequences rather than short ones so I extended the sequences:
However this results in 100% NaNs....
I'm wondering if I am doing something wrong?