Open phiweger opened 4 years ago
Further testing gives me the impression that (1) this does not always occur given the same input and (2) only occured when I add the --confidence flag to treetime.
it this run a tree with four leaves with some identical dates/branch lengths? then it is likely a numerical instability when trying to invert a singular matrix.
Yes, this is a larger tree (20+ leaves) but 3 of them are identical in their SNV alignment, but the dates are different. Is there a way around this instability, besides manually clipping the corresponding branch values to 0? The dates should help resolve polytomies, right?
could you send me these data. I can't quite explain why this might happen and it would be good to fix.
which data do you need? the alignment, dates, undated tree -- anything else?
yes, those are what I would need.
I think I might be having a similar error (if not I can open a new issue). When estimating date confidences using the marginal likelihood, some nodes will sporadically have very large intervals:
Rather than having intervals in the range of 100s of years, these nodes have confidence intervals of +100,000 years. These large intervals are somewhat random, in that rerunning the analyses moves them around. Any thoughts on why this might be occurring and if there's a solution?
yes, this looks like there is a problem. My hunch is that there is some numerical accuracy problem.
I was thinking numerical accuracy too. This is a large phylogeny with many small branches (1e-8). Would there be any value in rescaling the branch lengths before (ex. multiply them all by 1e4)?
I suppose this is a large genome? Does this use a SNP only alignment? Or a vcf file? TreeTime carries around an internal scale that is one_mutation = 1/L
(L being the length of the genome). One could just try to trick it in assuming the genome is shorter. But I am not sure I understand your application well enough.
I think I might be having a similar error (if not I can open a new issue). When estimating date confidences using the marginal likelihood, some nodes will sporadically have very large intervals:
Rather than having intervals in the range of 100s of years, these nodes have confidence intervals of +100,000 years. These large intervals are somewhat random, in that rerunning the analyses moves them around. Any thoughts on why this might be occurring and if there's a solution?
I am having this same (or a similar) issue on a SARS-CoV-2 dataset with roughly 5000 sequences using the flags, however it occurs without the covariation or branch-length-mode flags as well:
-tree ml_clean.nwk --dates clean_metadata.tsv --aln aln_clean.fasta --clock-filter 4 --reroot EPI_ISL_402125 --covariation --coalescent skyline --clock-rate 0.001 --clock-std-dev 0.0005 --branch-length-mode joint --confidence --keep-polytomies
I'm using a full alignment. The problem is random and rerunning on the same dataset can generate reasonable confidence intervals, but it happens often enough that it is an issue. Using TreeTime v. 0.80 on Python v3.9. I've attached the treetime output as well as the ML tree and a list of accession numbers (can't share alignment because GISAID data).
Sorry, just started to pick this up again. All the numbers in the dates.tsv
file look sensible and these should be the same as in the graph -- with the exception of those labeled as problematic branches which are masked in the dates.tsv
and not in the graph. My hunch is that these long bars are essentially undefined confidence intervals of branches that don't follow the clock to an extend that we can rely on this estimation. I'll add a line to exclude these from the graph.
In the TreeTime .nexus output I get a huge negative branch len followed by another large on for the corresponding leaves:
Is this a bug or some numerical instability? How could I avoid this?
Thanks a lot!