matsengrp / linearham

A Bayesian Phylo-HMM for B cell receptor sequence analysis
http://matsengrp.github.io/linearham
6 stars 4 forks source link

Branch lengths almost all much larger than hamming distance #90

Open psathyrella opened 2 years ago

psathyrella commented 2 years ago

There's a million ways to quantify this, but here's a few:

        warning tree depth and mfreq differ by more than 25% for 259/289 nodes
            mean values:  tree depth 0.064  mfreq 0.027  diff 0.037  abs(frac diff) 211%  ratio 3.1
  10 with highest abs(diff):
    tree depth   mfreq      ratio    diff
      0.2668    0.0871       3.1     0.1797     040921-P5-C7-HC-igh
      0.2073    0.0606       3.4     0.1467     040921-P5-F1-HC-igh
      0.2027    0.0568       3.6     0.1459     040921-P5-F2-HC-igh
      0.2018    0.0568       3.6     0.1449     bil
      0.2313    0.0871       2.7     0.1442     041321-P1-D7-HC-igh
      0.2006    0.0644       3.1     0.1362     040921-P5-H8-HC-igh
      0.1872    0.0606       3.1     0.1266     jqr
      0.2073    0.0833       2.5     0.1240     040921-P1-D8-HC-igh
      0.1844    0.0606       3.0     0.1238     cho
      0.2032    0.0833       2.4     0.1199     040921-P2-B3-HC-igh
  10 with highest+lowest ratios:
      0.1016    0.0076      13.4     0.0941     033021-P1-C9-HC-igh
      0.0910    0.0076      12.0     0.0834     cwy
      0.0871    0.0076      11.5     0.0795     den
      0.0849    0.0076      11.2     0.0773     bjk
      0.0835    0.0076      11.0     0.0759     gkw
      0.0776    0.0606       1.3     0.0170     lrv
      0.0436    0.0341       1.3     0.0095     041321-P1-C3-HC-igh
      0.0434    0.0341       1.3     0.0093     cny
      0.0675    0.0530       1.3     0.0144     041321-P1-F12-HC-igh
      0.0048    0.0038       1.3     0.0010     fmz

This ^ is checking the depth of each node; I also by default check each branch length, and they are similarly off.

data not public, but is here /fh/fast/matsen_e/processed-data/partis/parul-vrc01gh/no-dups/d1/linearham/work/igh/iclust-2/cluster-2/