Closed Anaphory closed 6 years ago
I don't remember the details, but as-is the resulting analysis will not run. I do know that it is possible to sample tip locations, so in principle an analysis including languages missing location data is perfectly possible. If we can get it working in time I'd be perfectly happy to include that ability in 1.4. This might actually be very easy, I've simply never looked into it as it hasn't yet been an itch for me, but if it's causing you problems I fully encourage you to look into it.
I probably have some XML files lying around somewhere which do geographic tip sampling, if I can find them I'll share the details with you.
I had to change the bit where BEASTling tries to convert "?" into decimals with floating point, but otherwise the analysis seems to run perfectly fine with missing entries.
I haven't tried sampling tips.
Huh, that's a pleasant surprise! Not sure why I got the idea that it didn't "just work". Feel free to push that change.
I guess we should add an option to drop languages with missing locations in case somebody really wants that?
I have inspected the geography nexus file generated by beast and found to my surprise that it lists geolocations for all nodes, including tips and internal nodes, and that the coordinates for the one tip language I checked were constant for the first few steps where I checked them. I have not looked yet what location it is that language gets assigned.
Yes, those internal node locations are not sampled but they are estimates of the mean location under some kind of quicky-and-dirty approximation to the diffusion process. There is an option to do a much better job of estimating them using some kind of particle filter, but it slows things down substantially and is not exposed by BEASTling.
I'm curious as to whether or not the location assigned to the tip with the missing data happens to be the exact location of some other node. I think that's a distinct possibility judging from my recently gained understanding of how TraitSets work...
Yes, that's what I wanted to check as well.
These languages now have their locations sampled, rather than being excluded.
What goes wrong when we don't exclude languages without location data?
https://github.com/lmaurits/BEASTling/blob/bd7c6c155ae482f9693534b12b5a16f44007166c/beastling/configuration.py#L571