Closed shenglin-liu closed 1 year ago
Dear @shenglin-liu,
Unfortunately, there is no reliable way to diagnose, from a single run, whether or not a complex mixture model like RELAX converged. From the example you show above, 8/10 runs basically gave you the same answer (relaxed selection with a p-value on the order of 0.0001), so that's pretty good.
RELAX does some internal checks during a run and if detects some convergence issues, it will attempt to resolve them during the same run (transparent to the user). However, as these are heuristic, you are not guaranteed success.
--starting-points N
does not perform N
distinct RELAX runs. What is does instead, is perform N
quick and dirty optimizations at the start of the run, then picks the best result, and runs the full model using it as the starting point. The benefit is that, as you noticed, runtimes are not strongly affected.
If all you want is the standard RELAX test, and you wish to reduce (but not eliminate) issues due to "misconvergence", I suggest the following options
hyphy CPU=1 relax --starting-points 100 --grid-size 2000 --models Minimal --alignment gene.fa --tree tree.nh --test test --reference ref --output out.json > out.log
Setting --grid-size
to a higher value makes the runs a bit slower, but improves the ability of earlier optimization runs to find a good starting point. Setting --models Minimal
avoids fitting even MORE complex models (like the General Descriptive).
HTH, Sergei
PS If your alignments are smaller, and if you see Collapsed
rate classes in the output, consider reducing the number of rate classes to 2
, using --rates 2
Selection mode | dN/dS | Proportion, % | Notes |
---|---|---|---|
Negative selection | 0.000 | 47.395 | |
Negative selection | 0.000 | 5.352 | Collapsed rate class |
Diversifying selection | 5.138 | 47.253 |
Dear authors of RELAX,
I have been experiencing some convergence issues with RELAX. For some genes, RELAX can yield highly different p or k values when I ran them multiple times. I am aware that the same issue has been raised before (e.g., #1551, #1237, #1161 and #730). I have adjusted my commandline according to these pages. For example, I used one of the most recent versions of hyphy (2.5.46; installed using conda), I limited the number of CPUs to one, and I used multiple starting points. But the convergence issue still exists. I need to run RELAX on over 15000 genes. Is there a way to know which genes are having convergence issues by looking at their JSON outputs? Then I can pay special attention to them and treat them separately.
Here is an example where I ran the same gene for 10 times.
Here is a template of the commandline I used. hyphy CPU=1 relax --starting-points 10 --alignment gene.fa --tree tree.nh --test test --reference ref --output out.json > out.log
Also, I was using "--starting-points 10". But I don't see any increase in my run time. Is that normal?
Thank you very much!
Best regards, Shenglin