Closed matsen closed 8 years ago
TasParse.py makes a phylip file of the fasta from Luka. It duplicates the > 17
sequence 17 times, and omits the > GL
germline sequence (since it was not observed). In history.bash, phylip's dnapars is called to generate parsimony trees, then these are passed to the branching process likelihood code, which filters out trees containing fractional mutations. Is there a way to constrain parsimony trees to use a specified root sequence (the germline Vh), which in this case was not observed?
The next step is to run some parsimony trees on the Victora data. This will require parsing the Victora data files and making a file suitable for a phylogenetics program. In my exploratory steps I used PHYLIP, and I think I first converted the Victora files to FASTA, then used seqmagick to convert to Phylip format. Note that there is a maximum sequence name length for
.phy
files. You'll need to deal with that, perhaps through recoding, or through finding some substring that's unique.Phylip will return a lot of trees, which is good! Some of these will have fractional mutations.