veg / hyphy-analyses

HyPhy standalone analyses
MIT License
36 stars 17 forks source link

Raw counts in MG94 output #18

Open ekopania opened 2 years ago

ekopania commented 2 years ago

Hello, I have a question regarding the output for the FitMG94 model with separate dN/dS calculations for each branch (--type local). Would it be possible to output the raw counts for # synonymous sites, # synonymous substitutions, # nonsynonymous sites, and # nonsynonymous substitutions for each alignment and branch?

I am trying to calculate an average dS for each branch across many gene alignments, and would like to do so by calculating (# synonymous substitutions across all alignments) / (# synonymous sites across all alignments) for each branch. Same for dN and dN/dS.

Thank you! Emily

stevenweaver commented 2 years ago

Dear Emily,

Before delving into this issue further, did you happen to come across this issue thread yet?

With release 2.5.27 and an update to you will now see dS and dN estimates in the output JSON

Best, Steven

spond commented 2 years ago

Dear Emily,

You can get these quantities from the output of SLAC. The resulting JSON will contain something like what follows, so you can pull out ES and S (columns 0 and 2) from the json["MLES"]["content"]["0"]["by-branch"]["AVERAGED"][xx] table, where xx is the index for the branch (use the NAMES entry to map indices to names).

Best, Sergei

   "headers":    [
["ES", "Expected synonymous sites"],
    ["EN", "Expected non-synonymous sites"],
    ["S", "Inferred synonymous substitutions"],
    ["N", "Inferred non-synonymous substitutions"],
    ["P[S]", "Expected proportion of synonymous sites"],
    ["dS", "Inferred synonymous susbsitution rate"],
    ["dN", "Inferred non-synonymous susbsitution rate"],
    ["dN-dS", "Scaled by the length of the tested branches"],
    ["P [dN/dS > 1]", "Binomial probability that S is no greater than the observed value, with P<sub>s</sub> probability of success"],
    ["P [dN/dS < 1]", "Binomial probability that S is no less than the observed value, with P<sub>s</sub> probability of success"],
    ["Total branch length", "The total length of branches contributing to inference at this site, and used to scale dN-dS"] 
         "AVERAGED":          [
[131.3168044428377, 387.3226301460821, 18.83333333333334, 62.16666666666666, 0.2531947933093847, 0.1434190651626122, 0.1605035746122553, 0.00992078191838351, 0.3899963120436746, 0.7047685726152259, 0.1902540471972952],
          [129.561236545261, 390.2822794214606, 20.16666666666666, 84.83333333333334, 0.2492312254858537, 0.1556535519759541, 0.2173640408657216, 0.03583458478317615, 0.1058743790554714, 0.9314784716730345, 0.2522218225267418],
          [133.5384417395946, 389.1083669125233, 9, 31, 0.2555041751502972, 0.06739632335646366, 0.07966932257452385, 0.007126792202360227, 0.4081337126529876, 0.7271272709575386, 0.1004391926327223],
          [136.2754101961045, 388.2927197040365, 20.66666666666666, 73.33333333333334, 0.2597859123123749, 0.15165367425368, 0.1888609536362909, 0.02160584742683141, 0.2259173468355006, 0.8400143616880725, 0.212636878745021],
          [133.7972276866482, 389.2600981649001, 23, 92, 0.2557984011959369, 0.1719019175334916, 0.2363458274652815, 0.03742185154836802, 0.1008314010665072, 0.9335295689162557, 0.2703513223995095],
          [140.2046156142308, 396.7823526713004, 7.5, 18.5, 0.2610950058282979, 0.0534932460471633, 0.04662505747912039, -0.00398827962598081, 0.7092369108736087, 0.4501563363877576, 0.0672975934705809],
          [134.751454631459, 392.5139932178286, 0, 3, 0.2555666319139805, 0, 0.007643039616004536, 0.004438226888951172, 0.4125508577792068, 1, 0.003838196760252208],
          [136.0881867826784, 397.163888281234, 1, 0, 0.2552042329443682, 0.007348176382105211, 0, -0.004267003135182485, 1, 0.2552042329443682, 0.001708013303307051],
          [136.5164955204506, 396.8393878577912, 6, 8, 0.2559576068717268, 0.04395073267245702, 0.02015928923584276, -0.01381542282814194, 0.9568582632702024, 0.1224548551448086, 0.02610813928399086],
          [0, 0, 0, 0, null, null, null, null, 1, 1, 0],
          [136.570310137287, 396.6299533706336, 0, 1, 0.2561332382673481, 0, 0.002521241755701549, 0.001464056647079198, 0.743866761732652, 1, 0.001848757843820416],
          [136.9113838166697, 396.4145310157983, 4.5, 4.5, 0.2567124154461485, 0.03286797543457522, 0.01135175339932395, -0.01249422742563463, 0.9670961780216861, 0.1164171049315259, 0.01814302496171654],
          [138.8302359271815, 397.7942542778725, 9, 39, 0.2587102125624791, 0.0648273766870254, 0.09804063176024963, 0.01928656256431083, 0.1684519685777126, 0.9055793519965954, 0.107969146861863],
          [143.7819090576958, 396.4599645760192, 42.50000000000001, 70.5, 0.2661435850771913, 0.2955865607748044, 0.1778237559885621, -0.06838353233524548, 0.9959931935958849, 0.007077414287111505, 0.2814556919165797],
          [145.8516833705621, 394.8039017195687, 10, 23, 0.2697681988178253, 0.06856280139457305, 0.05825676975284054, -0.005984596318843896, 0.7403138091754105, 0.3964379183315833, 0.06779898779960684],
          [145.2407394759065, 395.7881415173817, 21, 35, 0.2684528397250354, 0.1445875315409257, 0.08843114870954999, -0.03260937804725617, 0.9712055026164764, 0.05300488459460351, 0.1200222199260738] 
         "NAMES":          [
ekopania commented 2 years ago

Yes, I did. I am interested in getting the raw numbers that went into calculating dS and dN, so the total # of sites and # of substitutions (both synonymous and nonsynonymous). Thank you!

ekopania commented 2 years ago

Dear Steven and Sergei, Thank you for the quick and helpful responses! I will try running SLAC. Emily