psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
54 stars 36 forks source link

get-tree-metrics action error #324

Closed bio-liucheng closed 8 months ago

bio-liucheng commented 8 months ago

Hi, developer.

I am attempting to extract phylogenetic tree-related information from test data as described in output-formats.md of tree-info section. However, when trying to use what I understood to be the relevant action --get-tree-metrics, the software indicates that this action is not available. Here is the command I used and the message I received: partis partition --infname <input.fasta> --outfname <output.yaml> --get-tree-metrics partis: error: unrecognized arguments: --get-tree-metrics

root@47191408bf14:/partis# partis version commit: f464761242678d631753306400f6bc3a60e79fb2 tag: 0.16.0 (well, 1498 commits ahead of) Could you please provide guidance on how to access these tree metrics using Partis? If there is an alternative command or a set of steps that I need to follow, I would greatly appreciate your instructions.

Thanks.

psathyrella commented 8 months ago

Whoops, sorry about that, I must've missed updating that bit of the docs for the switch to --get-selection-metrics (--get-tree-metrics is the old name for the same argument). I'll fix the docs but in the meantime you want to look here for a full description of the option: https://github.com/psathyrella/partis/blob/main/docs/subcommands.md#get-selection-metrics

psathyrella commented 8 months ago

https://github.com/psathyrella/partis/commit/aa8db0c6fcd86f1e8e7ef1c19a1a95664f0be8e4

bio-liucheng commented 8 months ago

Thank you for your response!

I am currently utilizing the fasttree for phylogenetic tree construction. During this process, I have observed a peculiar pattern where certain sequences are classified as nodes rather than leaves. These sequences are characterized by notably higher values of Local Branching Index (LBI) or Local Branching Ratio (LBR) compared to other samples in the dataset.

I want to know clarification on the criteria or algorithmic principles that Partis employs to designate certain sequences as nodes. Understanding the underlying rationale for this classification is crucial for the correct interpretation of my phylogenetic analysis results.

Additionally, I would appreciate any insights or guidelines on how to interpret these sequences that are identified as nodes. Are there specific biological or methodological implications associated with these sequences having higher LBI or LBR values?

Thanks.

psathyrella commented 8 months ago

It's hard to be too definitive without seeing your particular data, but I can say that fasttree always puts all observed sequences as leaves. This I'm sure makes sense for its original use case, but as you're finding, observed BCR sequences are often internal nodes. Thus when partis is reading fasttree output, it collapses any leaves that are on zero-length branches (i.e. moves the observed sequence to the internal node at the top of the zero length branch). This is all fine, but you should know that if you're digging into the details of lineages and inferred ancestral nodes you might want to use a more accurate method like iqtree to double check results.

As to the LB metric values -- yeah they'll generally be much larger for internal nodes than leaf nodes, simply because leaf nodes have no descendents. There's more discussion in the paper (see screenshot for one bit), but the upshot is that while this is a heuristic, it's probably a reasonable one, and quite possible kind of close to optimal.

p