Closed jjmccollum closed 1 year ago
I think that's what you fixed in the patch
Yeah, it is. Do you have to change the path to your beast
executable?
You might already be doing this, but you should be able to follow the commands in the beast.yml
workflow for cloning the necessary repos, building the BEAST executable library, installing the necessary packages with packagemanager
, and running beast
with the XML output, right?
I worked it out. I had to delete the Beast packages directory in my user directory. I think beast copies libraries there the first time it runs. The tipdates code is running for me. I'll try to get it going in the CI.
@rbturnbull All right, I've switched over to a Jinja2 template for BEAST XML output, and I've also switched from a strict clock to a UCRelaxedClockModel
. Test coverage is still at 100%, and the BEAST test is running end-to-end. Following the lead of the phylopaul
example inputs, I've set the rateCategories
input to the clock model to an IntegerParameter
with dimension 2*|taxa| - 2. It looks like this so far:
<branchRateModel spec="UCRelaxedClockModel" id="clock" tree="@tree">
<distr spec="LogNormalDistributionModel" M="0.0" S="1.0"/>
<parameter spec="parameter.RealParameter" id="clock.rate" name="clock.rate" lower="0.0" upper="100.0" value="1.0"/>
<rateCategories spec="parameter.IntegerParameter" id="clock.rateCategories" dimension="{{ clock_rate_categories }}" value="1.0"/>
</branchRateModel>
Comparing this to your examples, I suspect that the clock.rate
parameter is superfluous (the M
and S
inputs to the LogNormalDistributionModel
distribution should be sufficient, and I can make both of them parameters to be estimated by the model).
@rbturnbull Okay, I've adjusted the clock model to look more like what you have in the phylopaul
examples:
<branchRateModel spec="UCRelaxedClockModel" id="clock" tree="@tree" clock.rate="@clock.mean" rateCategories="@clock.rateCategories">
<distr spec="LogNormalDistributionModel" meanInRealSpace="true" S="@clock.stdDev">
<M spec="parameter.RealParameter" id="clock.distr.M" lower="0.0" upper="1.0" value="1.0" estimate="false"/>
</distr>
</branchRateModel>
...
<stateNode spec="parameter.RealParameter" id="clock.mean" lower="0.0" upper="Infinity" value="1.0"/>
<stateNode spec="parameter.RealParameter" id="clock.stdDev" lower="0.0" upper="Infinity" value="0.000001" estimate="false"/>
<stateNode spec="parameter.IntegerParameter" id="clock.rateCategories" dimension="{{ clock_rate_categories }}" value="1"/>
We need to do some work on the clock models. The value here for the stdDev is fixed to be very low. I'm not sure if we want that to be the default.
Should we also make it a parameter to be estimated by the model, or should we just default it to 1.0?
I think it's best for the model to estimate the parameter. I'm not sure if there are going to be scale issues when setting the prior. We need to do a bit of experimentation to work out the best way to do it.
I nearly have this ready (along with the code and unit tests for parsing root date ranges), but I had some questions about the clock model parameters:
clock.mean
and clock.stdDev
parameters should have their own prior
elements in teh BEAST XML template?clock.stdDev
also have an associated operator, and if so, what type of operator should it be? We currently have a couple operators that affect the clock.mean
parameter:<operator id="tree_updown" spec="UpDownOperator" scaleFactor=".75" weight="10">
<up idref="tree"/>
<down idref="clock.mean"/>
</operator>
<operator id="clock.rate_Scaler" spec="ScaleOperator" scaleFactor=".75" weight="1" parameter="@clock.mean"/>
If we're estimating the standard deviation then we'll need a prior and an operator.
I've gone with identical log-normal priors for both clock.mean
and clock.stdDev
for now. Their priors probably shouldn't be exactly equal, but we can discuss that later. I also added a ScaleOperator
for clock.stdDev
with identical parameters to the ScaleOperator
for clock.mean
; I did not add an upDownOperator
relating clock.stdDev
to tree
(referring to the tree height, I assume?) like clock.mean
has, but that can be easily changed.
@rbturnbull All right, the --clock
command-line option (for MrBayes and BEAST outputs) is now implemented; the permitted options are strict
(the default option), uncorrelated
(which corresponds to the UCRelaxedClockModel
in BEAST and the Independent Gamma Rate model in MrBayes), and local
(BEAST only). Unit tests are in place for this code, coverage is at 100%, and the mrbayes.yml
and beast.yml
workflows have been updated to test all valid options.
@rbturnbull Per Remco's suggestion on https://github.com/CompEvol/beast2/issues/1075, I'd like to add priors on the transcriptional rate parameters to see if this prevents ill-conditioned substitution matrices from arising. If I did this, I could even tabulate the occurrences of different transcriptional change categories (marked with the @ana
tag in a rdg
element) in ignored readings (e.g., defective readings and subreadings) and use their relative proportions to inform the priors.
I imagine that a LogNormal distribution with an offset of 1, a mean somewhere above 1 (maybe 2 by default, but a number could be initialized by a tabulation like the one described as above), and a standard deviation of 1 would be a good place to start. If we did this, then would we want to make the mean of the distribution a separate parameter and replace the operators for the transcriptional rate parameters with operators for the means of their prior distributions?
Hi @jjmccollum - I like the idea of a log-normal distribution as the prior. I'm not sure about the offset. I think the best thing to do is to start simple and try out various options. Do you want to catch up on zoom to discuss?
@rbturnbull That would be good. I won't be available after Tuesday this week, but if you're available to chat before that, then that would work for me. Thanks!
@rbturnbull Okay, I implemented log-normal priors on the transcriptional rates (with M=1.0
and S=1.0
to start) in the most recent two commits today, and the BEAST runs for all three clock models with a chain length of 20,000,000 completed without matrix errors in about 3 hours and 45 minutes. So we should be good using the ComplexSubstitutionModel
again. The next step will be to determine more intuitive prior distributions on the transcriptional rates in general, but perhaps we can discuss that in person in a couple days.
great work @jjmccollum
@rbturnbull I'm nearly done writing up the new documentation for the features in this branch, so we're just about ready to do a pull request for this branch. There is one thing I noticed that doesn't seem right, though: in the ubs_ephesians.xml
input, I specify that witness 18 has a fixed date:
<witness n="18">
<origDate when="1364"/>
</witness>
As expected, this is mapped directly in the trait
element under the tree
element as 18=1364
in the output BEAST 2.7 XML file, and since there is no date range, no MRCAPrior
is included for this witness. Thus, I would expect the tip for this witness to have no date range in the output DensiTree, but it seems to have a date range nevertheless:
It may be good to figure out why this is happening before we merge in this branch.
That's quite strange. I don't understand why that is happening. Do you have the .trees file in the output that you used to create the densitree? Can you please send that to me?
Chrysostom also looks to have a variable tip date even though it is set to 407 in the XML.
This issue may have something to do with the BDSKY model playing around with the tip dates; see https://github.com/CompEvol/beast2/issues/981.
I've just merged beast-xml
into main
; I'm pleased to say we can now close this issue!
For use cases with BEAST (2) as the target phylogenetic software, conversion to NEXUS followed by a second conversion through BEAUti is presently supported, but direct conversion to a BEAST XML input file would allow for the mapping of additional features, the most notable of these being variation unit-specific substitution models and additional parameters to be incorporated into these models.
Because of the extensive nature of BEAST XML files, the conversion process will involve starting with a template file and adding new elements for witnesses, including fields for their sequences and date calibrations, and root frequencies and substitution models for each variation unit.
This feature may will probably take extra effort to implement, so this effort should be undertaken on a dedicated branch.