This will be a subcommand that can be used to easily set up a simulation.
The input is a reference package and a taxid list. All of the sequences/leaves with these taxids and their descendants will get removed from the reference package an a jplace file will be prepared that shows what pplacer should do if they get placed back into the tree.
The following would seem to be convenient output. I think that adding a --prefix to standardized output file names should work great.
a refpkg without those leaves/seqences. I don't think it's convenient to make refpkgs in rppr itself, so this could be a collection of substitutes for the refpkgs, and perhaps output a taxit call that will update the corresponding refpkg. Note that this will require spitting out a tree that has the corresponding leaves removed from the tree, and the corresponding nicks healed.
A jplace file that represents the ideal placement according to the original tree. This is a bit tricky, and we will want to make sure this code is cleanly factored. For that we will need to first renumber the tree as if all of the removed sequences were not there, and then consider where they attach to this reduced tree to come up with the corresponding edge numbers for the removed sequences. The attachment branch length should be the total branch length from the reduced tree attachment point to the deleted leaf.
A fasta file with all of the remove sequences, that then can be used to simulate reads.
Although we will have this subcommand be available, it can be a bit more rough-cut (and under-documented) than normal. It's primarily for our own use.
This will be a subcommand that can be used to easily set up a simulation.
The input is a reference package and a taxid list. All of the sequences/leaves with these taxids and their descendants will get removed from the reference package an a jplace file will be prepared that shows what pplacer should do if they get placed back into the tree.
The following would seem to be convenient output. I think that adding a
--prefix
to standardized output file names should work great.Although we will have this subcommand be available, it can be a bit more rough-cut (and under-documented) than normal. It's primarily for our own use.