Open Ellmen opened 2 years ago
Can you post your tree file? The program doesn't handle multi-line FASTAs. It expects the sequence in one line. Also, the program can't handle internal node names so it's best to remove them. Also, every leaf in your tree should correspond to a sequence in the MSA.
I created a pull request to support multiline fastas if you're interested. I didn't build a tree, I just used the common_allele method. Do you think it would make sense to add gaps as a fifth character (in addition to ACGT)?
The tree imputation is magnitudes of order more accurate than the common allele imputation. I just included the common allele imputation option as an extra feature to the program since we used it in our manuscript. Gaps are part of the missing data that is being imputed. Gaps and any other ambiguous character (non-ACGT) are what is being imputed. If you post the error that you're getting or if you post a small sample file to test, I can try to troubleshoot for you.
This is a nice method! I'm a masters student at the University of Waterloo working on wastewater surveillance in Canada.
I wasn't able to get the tree imputation working but the common_allele method doesn't treat
-
as a known character which causes the method to impute gaps. This is probably undesirable since some omicron and delta have several deletions which are imputed to the most common base and then read as mismatches.