matsengrp / cft

Clonal family tree
5 stars 3 forks source link

alternatives to dnapars #149

Closed matsen closed 5 years ago

matsen commented 7 years ago

If we become frustrated with dnapars, here are two alternatives:

The cool thing about POY is that it can use indels as informative about the tree, and doesn't require a sequence alignment step. I couldn't get it to work in 30 mins.

PAUP* does ancestral sequence reconstruction according to the manual. It's an old standby.

metasoarous commented 7 years ago

If we become frustrated?

From POY docs:

Phylogenetic Application written in OCaml and C

Awesome.

It would definitely be nice to get the indel inference features of POY for #131. Though we'd still have to do a preliminary alignment and FastTree for the tree pruning step.

As much as I've come to fear/loath dna{ml|pars}, I don't see a compelling reason to switch to PAUP at the moment. For now, we've more or less resolved the name length issues. But if we have further difficulties and still can't get POY to work, it's something to consider.

I'm going to Icebox this for now, but may pull it out if time comes up for looking more at #131.

metasoarous commented 7 years ago

Mmm... that didn't take long...

Getting bitten again by dnaml/phylip issues (and mosquitoes), so looking lustily towards poy at the moment. Spent 20 minutes or so trying to get it to compile but no dice; issues with malloc.h header. @matsen were you able to compile it at least, or is this where you were hung up as well? Should we bug the folks at SciComp or see if a wizard like @bcclaywell could compile it?

matsen commented 7 years ago

I know that you're all excited because it's written in OCaml, but did you notice that the last commit was in mid 2015? In fact real development stopped much earlier than that, when Andres Varon left for Jane Street Capital (I know this crew reasonably well). They are starting from scratch in Haskell last time I heard from the PI a few summers ago.

So perhaps I shouldn't have suggested it in the first place.

I think that we can do just fine with PAUP. It's quite actively developed, if by an old-schooler, and we can do indels using gap coding using the GapMode flag. Generating NEXUS files is pretty wacky, but at least we only have to make them for the most part. We can consume them using DendroPy.

In any case we should keep in mind that parsimony is some sort of stopgap until we can figure out something better.

metasoarous commented 7 years ago

Don't get me wrong. I do think it's cool that it's written in OCaml. And I think it's even cooler that they're rewriting it in Haskell (even if I think rewrites are an antipattern). But I was more excited by the indel approach. However, based on the commit history and the installation difficulties, I agree that it's not worth it, especially considering PAUP has GapMode as a stopgap. (Now if it had meant I'd be able to write OCaml/Haskell, I might be putting up fight... But honestly I don't really care that much. For the record.)

Interesting overlap with the Jane Street crew! I've heard of them.

Anyway... NEXUS may not be my favorite format, but it sounds like heaven compared to the 💩🎂 we've endured from dnaml/dnapars. So yeah... I'll take a look. But, what do you think the timeline is on "something better"? I can still hack around what we have if the promised land isn't too far off, and it will be less work to do so than to make the switch. The real cost is in continuing to hack around the :shit:🚿. Thoughts there?

matsen commented 7 years ago

The promised land involves intractable likelihoods and is a research project that we haven't even started yet. So yeah, absolute minimum is a year. Two, probably.

metasoarous commented 7 years ago

OK; That's rather what I figured. Let's do this!

matsen commented 7 years ago

Does PAUP do likelihood ancestral sequence reconstruction as well?

metasoarous commented 7 years ago

It looks like it...

image

matsen commented 7 years ago

Great!

metasoarous commented 7 years ago

So sad... While PAUP does seem to do ancestral reconstructions, there doesn't seem to be a really good way to export them. This is all I can get it to spit out:

image

There's also this phylobabble thread from 2 years ago posting the same problem (not encouraging).

Also note that all of the trees produced seem to have ?---------A as the root character inference. Which is... wacky.

I could maybe write some code to reconstruct an ancestral state alignment from these ascii trees, but it wouldn't be pretty, and is probably more trouble than it's worth.

Unless things start looking up here somehow, it seems we're stuck hacking around phylip for the moment :-/ Sad!

matsen commented 7 years ago

Whoa! Crazy.

I've emailed the author.

metasoarous commented 7 years ago

Any news @matsen? I'm going to put on ice for now till we hear more or figure out an alternative.

matsen commented 7 years ago

No news. I sent the author a note asking him to reply on phylobabble, with no success.

metasoarous commented 7 years ago

As mentioned in #170, we're considering another tool for ancestral reconstruction (PRANK) that would take a fixed tree and use it to infer sequence ancestry. If that works well, we will have the freedom to look at using PAUP again, without having to worry about the ancestral reconstruction. This may simplify #130.

matsen commented 7 years ago

Just a reminder that if we go this route we'll want to validate it in the same way as Christian and Will are doing now.

metasoarous commented 7 years ago

As has been stated elsewhere, this is being put on ice until we can figure out what is going on with prank (see updates on #131). The work for RaxML is more or less done, and still in the code via a switch. But we shouldn't activate it until the prank bugs are resolved.

metasoarous commented 6 years ago

As mentioned in #170, we may want to look at http://www.iqtree.org/. See usage link in issues.

metasoarous commented 6 years ago

As @krdav updated us in #170, IQ-Tree seems to be doing a good job, and is a lot easier and faster to work with than dnaml, so when the time is right, I imagine we'll make the switch.

metasoarous commented 6 years ago

Update... looks like QTree isn't quite as hot for ancestral sequence reconstruction as we initially thought, upon further investigation from @krdav. As a result, we're bailing and putting this issue back on ice.