soedinglab / ffindex_soedinglab

Other
16 stars 2 forks source link

Zero secondary structure when using both pdb and pfam #3

Closed jamespjh closed 7 years ago

jamespjh commented 7 years ago

Hi,

When using both pdb and pfam:

hhsearch -d /home/ucgajhe/levine/databases/pdb70/pdb70 -ssm 4 -cpu 12 -o /home/ucgajhe/Scratch/Levine/results/test_YPR199C/YPR199C.0.ssw11.hhr -i /home/ucgajhe/Scratch/Levine/results/test_YPR199C/YPR199C.0.ss.a3m -v 2 -p 0 -cov 50 -ssw 0.11 -Z 5000 -d /home/ucgajhe/levine/databases/pfamA_30/pfam

we observe zero secondary structure scores for both PDB matches and PFAM matches:

No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 1gd2_E Transcription factor PA  98.7 3.4E-10 6.3E-15   83.8   8.4   68   11-81      1-68  (70)
  2 1sse_B AP-1 like transcription  98.7 2.3E-11 4.2E-16   97.0   0.0   60  235-294    26-85  (86)
  3 1gu4_A CAAT/enhancer binding p  98.1 2.5E-07 4.5E-12   69.3   9.7   61   14-77     11-71  (78)
  4 PF08601.7 ; PAP1 ; Transcripti  97.9   3E-08 5.5E-13   97.1   0.0   56  236-291   298-355 (356)
  5 1hjb_A Ccaat/enhancer binding   97.7   1E-07 1.9E-12   73.4   0.0   61   15-78     12-72  (87)
  6 1ci6_A Transcription factor AT  97.5 2.9E-07 5.3E-12   65.6   0.0   57   18-77      2-58  (63)
  7 PF00170.18 ; bZIP_1 ; bZIP tra  97.5 3.5E-07 6.5E-12   64.3   0.0   58   18-78      5-62  (64)
  8 2dgc_A Protein (GCN4); basic d  97.5 4.4E-07   8E-12   64.4   0.0   57   15-74      6-62  (63)
  9 2wt7_A Proto-oncogene protein   97.4 6.9E-07 1.3E-11   63.0   0.0   57   18-77      2-58  (63)
 10 1jnm_A Proto-oncogene C-JUN; B  97.3 1.2E-06 2.2E-11   61.0   0.0   57   19-78      2-58  (62)
 11 1dh3_A Transcription factor CR  97.2 1.7E-06 3.1E-11   59.9   0.0   52   19-73      2-53  (55)
 12 PF07716.12 ; bZIP_2 ; Basic re  97.1 2.7E-06 4.9E-11   58.3   0.0   51   17-70      4-54  (55)
 13 1t2k_D Cyclic-AMP-dependent tr  97.1 2.8E-06 5.2E-11   59.0   0.0   55   19-76      2-56  (61)
 14 3a5t_A Transcription factor MA  97.1   4E-06 7.2E-11   68.2   0.0   62   18-82     37-98  (107)
 15 2wt7_B Transcription factor MA  97.0 6.5E-06 1.2E-10   64.3   0.0   60   18-80     27-86  (90)
 16 PF03131.14 ; bZIP_Maf ; bZIP M  96.9 9.6E-06 1.8E-10   62.1   0.0   58   18-78     30-87  (90)
 17 5apu_A General control protein  96.5 0.00019 3.5E-09   59.1   3.2   48   19-73     46-93  (95)
 18 2oxj_A Hybrid alpha/beta pepti  94.6   0.029 5.3E-07   38.0   4.4   32   39-73      1-32  (34)
 19 2r2v_A GCN4 leucine zipper; co  94.4   0.036 6.6E-07   38.0   4.2   32   39-73      1-32  (34)
 20 PF16689.2 ; APC_N_CC ; Coiled-  94.2  0.0059 1.1E-07   44.5   0.0   43   39-84      1-43  (52)
 21 4c46_A General control protein  94.0  0.0079 1.4E-07   47.9   0.0   51   18-72     26-76  (76)
 22 1kd8_B GABH BLL, GCN4 acid bas  93.7   0.011   2E-07   41.0   0.0   35   39-76      1-35  (36)
 23 3w92_A Thioester coiled coil p  93.3   0.015 2.8E-07   39.4   0.0   31   40-73      1-31  (32)
 24 1deb_A APC protein, adenomatou  93.3   0.015 2.8E-07   43.2   0.0   43   38-83      2-44  (54)
 25 2wq1_A General control protein  92.6   0.025 4.5E-07   38.6   0.0   31   40-73      1-31  (33)
 26 3c3g_A Alpha/beta peptide with  92.3    0.03 5.4E-07   38.2   0.0   31   40-73      1-31  (33)

but when running with PDB only, we get nonzero scores for all matches.

I note that the PDB database download includes SS data, but PFAM does not. But this doesn't explain why we are getting zero SS scores for the PDB hits when PFAM is present:


No 10
>2dgc_A Protein (GCN4); basic domain, leucine zipper, DNA binding, eukaryotic regulatory protein, transcription/DNA complex; HET: DNA; 2.20A {Saccharomyces cerevisiae} SCOP: h.1.3.1 PDB: 1dgc_A* 1ld4_E 1ysa_C* 3p8m_D
Probab=97.47  E-value=4.4e-07  Score=64.37  Aligned_cols=57  Identities=26%  Similarity=0.323  Sum_probs=45.1  Template_Neff=9.500

Q ss_pred             CCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Q YPR199C          15 LTPPKNKRAAQLRASQNAFRKRKLERLEELEKKEAQLTVTNDQIHILKKENELLHFMLRS   74 (294)
Q Consensus        15 ~~~~k~KRKaQNRaAQkAFRERKE~rlkeLE~kl~ele~~~~~~~~L~~EnE~Lr~~n~e   74 (294)
                      ....+.+|+.+||.||+.+|+||..++.+||.++..|+..+   ..|..+++.|+..+..
T Consensus         6 ~~~~~~~kr~rnr~~~~~~R~rk~~~~~~le~~v~~l~~~~---~~l~~~~~~l~~~~~~   62 (63)
T 2dgc_A            6 SSDPAALKRARNTEAARRSRARKLQRMKQLEDKVEELLSKN---YHLENEVARLKKLVGE   62 (63)
T ss_dssp             -----CHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHC---
T ss_pred             cccHHHHHHHHhHHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHh
Confidence            34566778899999999999999999999999999999888   7888888888876654

No 11
>PF00170.19 ; bZIP_1 ; bZIP transcription factor
Probab=97.45  E-value=4.9e-07  Score=63.79  Aligned_cols=58  Identities=28%  Similarity=0.290  Sum_probs=51.1  Template_Neff=9.800

Q ss_pred             HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Q YPR199C          18 PKNKRAAQLRASQNAFRKRKLERLEELEKKEAQLTVTNDQIHILKKENELLHFMLRSLLTE   78 (294)
Q Consensus        18 ~k~KRKaQNRaAQkAFRERKE~rlkeLE~kl~ele~~~~~~~~L~~EnE~Lr~~n~el~~e   78 (294)
                      ++.+|+.+||.||+.||+||..++.+||.++..|+...   ..|..+++.|+..+..|..+
T Consensus         5 k~~rr~~~nr~~~~~~R~rk~~~~~~Le~~~~~L~~~~---~~l~~~~~~l~~e~~~L~~~   62 (64)
T GBF1_ARATH/220    5 KRQKRKQSNRESARRSRLRKQAECEQLQQRVESLSNEN---QSLRDELQRLSSECDKLKSE   62 (64)
Confidence            56788999999999999999999999999999999888   77888888888887776554

We note that secondary structure nonzero matches are found in the web-search tool, but that the downloadable version of hh-pfam does not have any SS info in it.

Most confusing of all, though, is why PDB matches become zero SS score when PFAM is present.

I think this might have something to do with the code in https://github.com/soedinglab/hh-suite/blob/master/src/hhviterbirunner.cpp

   int ss_hmm_mode = HMM::computeScoreSSMode(q_simd->GetHMM(0), t_hmm_simd->GetHMM(0));
    for(size_t i = 1; i < maxres; i++){
        ss_hmm_mode = std::min(ss_hmm_mode,
                               HMM::computeScoreSSMode(q_simd->GetHMM(0), t_hmm_simd->GetHMM(i)));
    }

and this:

https://github.com/soedinglab/hh-suite/blob/master/src/hhhmm.cpp

int HMM::computeScoreSSMode( HMM *q,  HMM *t){
    int returnMode = HMM::NO_SS_INFORMATION;
    if      (q->nss_pred>=0 && t->nss_dssp>=0) returnMode=HMM::PRED_DSSP;
    else if (q->nss_dssp>=0 && t->nss_pred>=0) returnMode=HMM::DSSP_PRED;
    else if (q->nss_pred>=0 && t->nss_pred>=0) returnMode=HMM::PRED_PRED;
    return returnMode;
}

which takes a minimum across the available data, so would result in zero SS for PDB when PFAM is present.

Any thoughts?

jamespjh commented 7 years ago

Logged against the wrong repo. Moving to hh-suite proper.