psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
57 stars 34 forks source link

Documentation for generate_trees #180

Closed Irrationone closed 8 years ago

Irrationone commented 8 years ago

Hello,

Could some documentation for generate_trees be added under the detailed documentation section in the manual? I'm uncertain as to what options to specify out of all the command line options that are listed when I run partis's help. Thanks!

psathyrella commented 8 years ago

yes! thanks for the suggestion, I should get to this today.

psathyrella commented 8 years ago

Actually, now I look at it I think I remember I was only using the generate-trees option for testing, and I would've been thinking about removing it, since I wouldn't expect that people would use it in isolation -- usually the tree generation happens as an automatic step when initializing to make simulation. If you want to use it, I of course wouldn't remove it, but it'd help to write the docs if I understood better what you wanted to generate trees for? It's really just running the R TreeSim package with branch lengths adjusted for the ones seen in data.

Irrationone commented 8 years ago

I was planning to use generate-trees after running cache-parameters. I was hoping that generate-trees could reproduce B-cell phylogenies (observed in the data, not simulated) based on the HMM parameters (and perhaps the results of partition).

psathyrella commented 8 years ago

ah, excellent. That is a great idea, and we would love to and plan to do that. But... we can't yet. We have gotten as far as partitioning into clonal families, and Erick has been working a lot on phylogenetic reconstruction specific to BCRs, but for now you'd have to do your own phylogenetic reconstruction starting from the partition output.

So at this point, the simulation doesn't use the empirical clone size distribution (it uses configurable analytic distributions, e.g. geometric) -- but this is fairly straightforward to change and may soon change. But it'll be a while before simulated phylogenies actually correspond to data, since we'll need to get our BCR-specific phylogenetics stuff up and running.

So for the time being, perhaps it makes sense for me to open an issue to add the tree information to simulation output, but under the understanding it'll be a while before those trees can correspond to the emprical trees?

Irrationone commented 8 years ago

Thanks for the clarification -- this makes sense to me now. Sounds good.