veg / hyphy-analyses

HyPhy standalone analyses
MIT License
36 stars 17 forks source link

Raw counts in MG94 output #18

Open ekopania opened 2 years ago

ekopania commented 2 years ago

Hello, I have a question regarding the output for the FitMG94 model with separate dN/dS calculations for each branch (--type local). Would it be possible to output the raw counts for # synonymous sites, # synonymous substitutions, # nonsynonymous sites, and # nonsynonymous substitutions for each alignment and branch?

I am trying to calculate an average dS for each branch across many gene alignments, and would like to do so by calculating (# synonymous substitutions across all alignments) / (# synonymous sites across all alignments) for each branch. Same for dN and dN/dS.

Thank you! Emily

stevenweaver commented 2 years ago

Dear Emily,

Before delving into this issue further, did you happen to come across this issue thread yet?

https://github.com/veg/hyphy/issues/1273#issuecomment-767181739

With release 2.5.27 and an update to FitMG94.bf you will now see dS and dN estimates in the output JSON

Best, Steven

spond commented 2 years ago

Dear Emily,

You can get these quantities from the output of SLAC. The resulting JSON will contain something like what follows, so you can pull out ES and S (columns 0 and 2) from the json["MLES"]["content"]["0"]["by-branch"]["AVERAGED"][xx] table, where xx is the index for the branch (use the NAMES entry to map indices to names).

Best, Sergei

"MLE":{
   "headers":    [
["ES", "Expected synonymous sites"],
    ["EN", "Expected non-synonymous sites"],
    ["S", "Inferred synonymous substitutions"],
    ["N", "Inferred non-synonymous substitutions"],
    ["P[S]", "Expected proportion of synonymous sites"],
    ["dS", "Inferred synonymous susbsitution rate"],
    ["dN", "Inferred non-synonymous susbsitution rate"],
    ["dN-dS", "Scaled by the length of the tested branches"],
    ["P [dN/dS > 1]", "Binomial probability that S is no greater than the observed value, with P<sub>s</sub> probability of success"],
    ["P [dN/dS < 1]", "Binomial probability that S is no less than the observed value, with P<sub>s</sub> probability of success"],
    ["Total branch length", "The total length of branches contributing to inference at this site, and used to scale dN-dS"] 
    ],
   "content":{
     "0":{
       "by-branch":{
         "AVERAGED":          [
[131.3168044428377, 387.3226301460821, 18.83333333333334, 62.16666666666666, 0.2531947933093847, 0.1434190651626122, 0.1605035746122553, 0.00992078191838351, 0.3899963120436746, 0.7047685726152259, 0.1902540471972952],
          [129.561236545261, 390.2822794214606, 20.16666666666666, 84.83333333333334, 0.2492312254858537, 0.1556535519759541, 0.2173640408657216, 0.03583458478317615, 0.1058743790554714, 0.9314784716730345, 0.2522218225267418],
          [133.5384417395946, 389.1083669125233, 9, 31, 0.2555041751502972, 0.06739632335646366, 0.07966932257452385, 0.007126792202360227, 0.4081337126529876, 0.7271272709575386, 0.1004391926327223],
          [136.2754101961045, 388.2927197040365, 20.66666666666666, 73.33333333333334, 0.2597859123123749, 0.15165367425368, 0.1888609536362909, 0.02160584742683141, 0.2259173468355006, 0.8400143616880725, 0.212636878745021],
          [133.7972276866482, 389.2600981649001, 23, 92, 0.2557984011959369, 0.1719019175334916, 0.2363458274652815, 0.03742185154836802, 0.1008314010665072, 0.9335295689162557, 0.2703513223995095],
          [140.2046156142308, 396.7823526713004, 7.5, 18.5, 0.2610950058282979, 0.0534932460471633, 0.04662505747912039, -0.00398827962598081, 0.7092369108736087, 0.4501563363877576, 0.0672975934705809],
          [134.751454631459, 392.5139932178286, 0, 3, 0.2555666319139805, 0, 0.007643039616004536, 0.004438226888951172, 0.4125508577792068, 1, 0.003838196760252208],
          [136.0881867826784, 397.163888281234, 1, 0, 0.2552042329443682, 0.007348176382105211, 0, -0.004267003135182485, 1, 0.2552042329443682, 0.001708013303307051],
          [136.5164955204506, 396.8393878577912, 6, 8, 0.2559576068717268, 0.04395073267245702, 0.02015928923584276, -0.01381542282814194, 0.9568582632702024, 0.1224548551448086, 0.02610813928399086],
          [0, 0, 0, 0, null, null, null, null, 1, 1, 0],
          [136.570310137287, 396.6299533706336, 0, 1, 0.2561332382673481, 0, 0.002521241755701549, 0.001464056647079198, 0.743866761732652, 1, 0.001848757843820416],
          [136.9113838166697, 396.4145310157983, 4.5, 4.5, 0.2567124154461485, 0.03286797543457522, 0.01135175339932395, -0.01249422742563463, 0.9670961780216861, 0.1164171049315259, 0.01814302496171654],
          [138.8302359271815, 397.7942542778725, 9, 39, 0.2587102125624791, 0.0648273766870254, 0.09804063176024963, 0.01928656256431083, 0.1684519685777126, 0.9055793519965954, 0.107969146861863],
          [143.7819090576958, 396.4599645760192, 42.50000000000001, 70.5, 0.2661435850771913, 0.2955865607748044, 0.1778237559885621, -0.06838353233524548, 0.9959931935958849, 0.007077414287111505, 0.2814556919165797],
          [145.8516833705621, 394.8039017195687, 10, 23, 0.2697681988178253, 0.06856280139457305, 0.05825676975284054, -0.005984596318843896, 0.7403138091754105, 0.3964379183315833, 0.06779898779960684],
          [145.2407394759065, 395.7881415173817, 21, 35, 0.2684528397250354, 0.1445875315409257, 0.08843114870954999, -0.03260937804725617, 0.9712055026164764, 0.05300488459460351, 0.1200222199260738] 
          ],
         "NAMES":          [
["PIG"],
          ["COW"],
          ["Node3"],
          ["HORSE"],
          ["CAT"],
          ["Node2"],
          ["RHMONKEY"],
          ["BABOON"],
          ["Node9"],
          ["HUMAN"],
          ["CHIMP"],
          ["Node12"],
          ["Node8"],
          ["Node1"],
          ["RAT"],
          ["MOUSE"] 
          ],
ekopania commented 2 years ago

Yes, I did. I am interested in getting the raw numbers that went into calculating dS and dN, so the total # of sites and # of substitutions (both synonymous and nonsynonymous). Thank you!

ekopania commented 2 years ago

Dear Steven and Sergei, Thank you for the quick and helpful responses! I will try running SLAC. Emily