Closed WRran-Hydrogen closed 5 months ago
Dear @WRran-Hydrogen,
Last month I updated aBSREL
to version 2.5, which includes a lot of detailed information (see http://vision.hyphy.org/absrel), including at site-level, e.g.
Regarding JSON, I generally recommend using jq
, which is a superb utility for extracting and processing these types of files. Can you be more specific about the desired export content/format?
Best, Sergei
Dear @spond ,
OK, I just want to get this tree file (newick file).
After that I can use tq
or JSON's
python module to extract (thanks to your recommendation of tq
) the corresponding information for the tree annotation. One question is that I found the tree file in the json
file, but it doesn't have the information about the branch lengths, I'm curious how you mapped the tree file to the Nucleotide GTR.
Another small question is that 703 alignments containing 1309 codonsite took a month to run hyphy
on 32 core compute node, this is too high time consuming, do you have any good idea?
Dear @WRran-Hydrogen,
The JSON
files for aBSREL
store branch length information in a separate object, since there's more than one model which can be used to determine branch lengths.
Specifically, you will find them in the "branch attributes"
object with keys corresponding to the names in the tree topology.
$jq '.["branch attributes"]' tests/data/bglobin.nex.ABSREL.json
{
"0": {
"BUSHBABY": {
"Baseline MG94xREV": 0.05190629582419842,
"Baseline MG94xREV omega ratio": 0.4276372944939755,
"Corrected P-value": 1,
"Full adaptive model": 0.06425072518686313,
"Full adaptive model (non-synonymous subs/site)": 0.03732139152551418,
"Full adaptive model (synonymous subs/site)": 0.02692933366134916,
"LRT": 0.5206747157799327,
"Nucleotide GTR": 0.04517814106887804,
"Rate Distributions": [
[
0.1757084634912426,
0.9788521349090655
],
[
11.52862779940965,
0.02114786509093447
]
],
"Rate classes": 2,
"Uncorrected P-value": 0.3253482546661789,
"original name": "BUSHBABY",
"posterior": [
[
0.9924805556484324,
0.9958656268001885,
0.7578008011181845,
0.8240155208190809,
or
jq -r '.["branch attributes"]["0"] | to_entries| map ([.key,.value["Nucleotide GTR"]]) [] | @csv ' tests/data/bglobin.nex.ABSREL.json
"BUSHBABY",0.04517814106887804
"CHICKEN",0.02523869181803026
"COW",0.02091027373492859
"DUCK",0.0438131515651915
"ELEPHSEAL",0.06849401632550524
"HAMSTER",0.09971426134210891
"HARE",0.005990205406934171
"HUMAN",0.0552201187847943
"MARSUPIAL",0.1948510592760009
"MOUSE",0.05851217515600016
"Node1",0.03043291644580744
"Node10",0.01072910735191889
"Node12",0.05315275138034392
"Node15",0.01631764913672979
"Node18",0.07053120745486645
"Node2",0.01641731439369391
"Node20",0.03005650409217756
"Node23",0.07795825025286905
"Node25",0.03555698487352726
"Node26",0.1442280763306011
"Node29",0.2891510149729472
"Node4",0.01293281296355458
"Node6",0.04462090787982554
"Node9",0.01980447658515092
"PIG",0.09712620791989478
"RABBIT",0.02253173750743446
"RAT",0.04117743614339072
"SHEEP",0.03036672099624838
"TARSIER",0.06592118593336037
"XENLAEV",0.1227158754464014
"XENTROP",0.1925856260944267
Best, Sergei
Dear @WRran-Hydrogen,
Regarding long run times. Are you saying that a single gene took a month to run? That's unusual. Can you please share the input data and the command you've used to run aBSREL
?
Best, Sergei
Dear @spond,
Command: hyphy absrel --alignment Domain.msa --tree maxcc.contree --output out.json ENV="TOLERATE_NUMERICAL_ERRORS=1;"
Tree file:
maxcc_contree.txt
Alignment file:
Domain.txt
Output file:
out.json
Version:
HYPHY 2.5.57(MP) for Linux on x86_64
Appreciate your help!
Dear @WRran-Hydrogen,
Sorry for the delay in responding. The run time (on my M1 max MacBook) is ~1 day (that's 4-6 cores).
However, there seem to be some issues with the alignment/analysis, which is probably due to the level of sequence divergence.
The nucleotide model returns branch lengths of >400 subs/site on the entire tree. That's effectively saturated/non-homologous, so that downstream analyses will either struggle to converge or won't make much sense.
* 1 partition. Total tree length by partition (subs/site) 422.232
During the fitting of individual branches, the longest ones are done first and many of those are effectively infitine as well.
Branch | Length | Rates | Max. dN/dS | Log(L) | AIC-c |Best AIC-c so far|
|-----------------------------------|----------|----------|--------------------|---------------|---------------|-----------------|
|IMGVR_UViG_3300024548_000656_330...|1463.01...| 2 | 0.11 (86.69%) | -620953.97 | 1247191.07 | 1247191.07 |
|IMGVR_UViG_3300024548_000656_330...|1463.01...| 3 | 0.67 (77.93%) | -620953.33 | 1247193.80 | 1247191.07 |
|IMGVR_UViG_3300005281_000426_330...|1462.71...| 2 | 0.08 (78.74%) | -620946.70 | 1247180.54 | 1247180.54 |
|IMGVR_UViG_3300005281_000426_330...|1462.71...| 3 | 0.08 (78.78%) | -620946.70 | 1247184.57 | 1247180.54 |
|IMGVR_UViG_3300002460_002005_330...|1462.69...| 2 | 0.23 (72.88%) | -620940.85 | 1247172.87 | 1247172.87 |
|IMGVR_UViG_3300002460_002005_330...|1462.69...| 3 | 0.25 (72.88%) | -620940.85 | 1247176.90 | 1247172.87 |
|IMGVR_UViG_3300003401_000205_330...|1462.68...| 2 | 0.06 (84.64%) | -620935.49 | 1247166.19 | 1247166.19 |
|IMGVR_UViG_3300003401_000205_330...|1462.68...| 3 | 0.06 (84.64%) | -620935.49 | 1247170.21 | 1247166.19 |
| Node659 |1462.66...| 2 | 0.06 (83.35%) | -620929.65 | 1247158.52 | 1247158.52 |
| Node659 |1462.66...| 3 | 0.26 (62.06%) | -620928.94 | 1247161.13 | 1247158.52 |
|IMGVR_UViG_3300006025_000592_330...|1462.61...| 2 | 0.07 (78.48%) | -620919.15 | 1247141.55 | 1247141.55 |
|IMGVR_UViG_3300006025_000592_330...|1462.61...| 3 | 0.08 (72.01%) | -620919.06 | 1247145.38 | 1247141.55 |
|IMGVR_UViG_3300018950_000246_330...|1462.60...| 2 | 0.06 (79.02%) | -620914.64 | 1247136.54 | 1247136.54 |
|IMGVR_UViG_3300018950_000246_330...|1462.60...| 3 | 0.18 (45.69%) | -620914.43 | 1247140.15 | 1247136.54 |
|IMGVR_UViG_GVMAG_S_3300000558_93...|1462.59...| 2 | 0.06 (80.40%) | -620909.06 | 1247129.42 | 1247129.42 |
|IMGVR_UViG_GVMAG_S_3300000558_93...|1462.59...| 3 | 0.06 (80.37%) | -620909.06 | 1247133.44 | 1247129.42 |
|IMGVR_UViG_3300017288_000012_330...|1462.58...| 2 | 0.05 (81.10%) | -620901.03 | 1247117.37 | 1247117.37 |
|IMGVR_UViG_3300017288_000012_330...|1462.58...| 3 | 0.07 (73.06%) | -620900.93 | 1247121.19 | 1247117.37 |
|IMGVR_UViG_3300009173_001818_330...|1462.56...| 2 | 1.00 (84.66%) | -620895.92 | 1247111.18 | 1247111.18 |
|IMGVR_UViG_3300009173_001818_330...|1462.56...| 3 | 28.04 (73.80%) | -620893.15 | 1247109.65 | 1247109.65 |
|IMGVR_UViG_3300009173_001818_330...|1462.56...| 4 | 21.78 (73.91%) | -620893.14 | 1247113.66 | 1247109.65 |
|IMGVR_UViG_GVMAG_M_3300017724_5_...|1462.54...| 2 | 0.09 (73.10%) | -620871.10 | 1247069.59 | 1247069.59 |
I would suggest that you take another look at the sequences, especially those branches and clades which have infinite branch lengths (see a part of the tree where such branches are apparent below)
Best, Sergei
Dear @spond,
Sorry for the delay in my reply. Thanks for your patient assistance, I recheck my data and realized that the quality of the alignment is too poor due to the high diversity of the sequence (3+ subs./site). There does not seem to be a good way to deal with this problem at this moment, maybe I need to cluster (hclust
or mmseqs2
) the data before considering selection pressure or foucus only on the upstream and downstream sequences of conserved motifs. Mang thanks again for your help and keep in touch🫡!
Cheers, Yao
Dear @spond ,
Thanks for your previous help, I got the results for the aBSREL model. but strangely it does not have the select pressure results for the site, probably because of the larger dataset. After that, I want to visualize the results with ggtree in R. I would like to ask hyphy if they have a function to convert NEXUS files from JSON? (For convenience I have attached my results file.🫡)
Best, Yao out.json