lmaurits / BEASTling

A linguistics-focussed command line tool for generating BEAST XML files.
BSD 2-Clause "Simplified" License
20 stars 6 forks source link

Pseudodollo #212

Closed Anaphory closed 5 years ago

Anaphory commented 5 years ago

Here comes the third block of changes for you to look at (and in this case probably ask for heavy modifications): An implementation of the pseudo-Dollo covarion model.

I can assure you it runs, but I guess I'll have to prove that with some tests :-)

lmaurits commented 5 years ago

Is this just the PDolloCovarion model or is the "regular" PDollo implemented in this branch too? Has that already been pulled into dev? Sorry, I'm behind on all of this. But I now have a boatload of practical experience getting the regular PDollo to actually converge on large, complex datasets (which it doesn't do reliably with the Beauti default priors/operators), so once this is into BEASTling I will probably hack on it enthusiastically.

Anaphory commented 5 years ago

I think I just implemented the PDolloCovarion, but the difference between these two is much less than most other stuff, so it should not be too hard to add the non-covarion model.

Anaphory commented 5 years ago

Yes, it is indeed just the PseudoDolloCovarion which I implemented here. (And currently it fails testing, I'll see what I can do about it.)

I don't think we gain much from inheritance between pDC and pD, because the area where these models differ from the basic binary model (eg. userDataType) is also the area where they differ from each other. Taking pDC as starting point for pD implementation should be helpful, though, in particular once we have tests for one or the other, too.

lmaurits commented 5 years ago

Got it.

I'm about to get on a ferry to Stockholm in a few hours, and will be at Uppsala for the rest of the week working hard to get a paper finished and submitted. When I'm back in Turku next week I'll try to schedule in an hour or two a day to spend on BEASTling stuff to catch up on all the issues/PRs you've submitted lately (sorry for ignoring them until now, I was quite busy last week preparing for a lecture I gave today).

If there is anything specific you want/need me to look at earlier so you can keep making progress on something, let me know, I should be able to do some stuff in the evenings this week.

Anaphory commented 5 years ago

Currently, tests here fail due to handling of missing data. This has probably to do with my implementation of synonym handling, #214 (which is also broken right now). The first few changes from #214 are in #211. #211 is the one I'd really like to see in the repository soon, because it's supposed to be just fixing minor issues, and then new features could build on top of that.

ETA: I have now merged #211 and #214, after their tests succeeded.

The main reason I touched so much recently is that I wanted a reasonable base for working on network phylogenies, using the speciesnetwork Beast package. That requires some restructuring of the tree-prior implementation, and doing that while other changes tweak that also would have been silly.

codecov-io commented 5 years ago

Codecov Report

Merging #212 into develop will increase coverage by 0.2%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop     #212     +/-   ##
==========================================
+ Coverage    93.49%   93.69%   +0.2%     
==========================================
  Files           21       22      +1     
  Lines         2612     2695     +83     
==========================================
+ Hits          2442     2525     +83     
  Misses         170      170
Impacted Files Coverage Δ
beastling/models/pseudodollocovarion.py 100% <100%> (ø)
beastling/configuration.py 92.03% <100%> (+0.03%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update eba8388...a03d376. Read the comment docs.

Anaphory commented 5 years ago

The lack of coverage is not surprising, it's good that at least I did not break any existing tests.

Anaphory commented 5 years ago

Hm, I guess a big bit of this model is the special behaviour of the intermediate states, their mapping to observed states using data type and stuff like that, all of which is pDC-specific.