lczech / gappa

A toolkit for analyzing and visualizing phylogenetic (placement) data
GNU General Public License v3.0
56 stars 7 forks source link

Interpreting examine assign #28

Closed snacktavish closed 6 months ago

snacktavish commented 9 months ago

Hi Lucas! :wave: @JD-X1 and I are trying out gappa on some data we have! I'm excited to use it - but I am confused by the results from gappa examine assign.

I have a jplace file that if I run gappa examine info I see that I have 207 queries (as expected)

Started 2023-11-01 12:40:05

Found 1 jplace file

Sample                              Branches      Leaves    Pqueries
RAxML_portableTree.notruncplace          114          58         207

But when I run gappa examine assign on that file, with my taxonomy metadata, I get a file with 34 lines (pasted below) I don't know how to interpret this, or connect it back to my 207 samples!

What am I missing?

notruncprofile.tsv:

LWR fract   aLWR    afract  taxopath
0   0   11.84   0.07574 Obazoa
3.024   0.01935 3.024   0.01935 Obazoa;Fungi
0.5464  0.003496    0.5464  0.003496    Obazoa;Metazoa
4.533   0.029   4.533   0.029   Obazoa;Rotosphaerida
3.734   0.02389 3.734   0.02389 Obazoa;Breviatea
7.069   0.04523 52.31   0.3347  Alveolata
42.29   0.2706  42.29   0.2706  Alveolata;Ciliate
0.5321  0.003405    0.5321  0.003405    Alveolata;Perkinsozoa
2.01    0.01286 2.01    0.01286 Alveolata;Dinoflagellata
0.4099  0.002622    0.4099  0.002622    Alveolata;Chrompodellids
0   0   21  0.1344  Discoba
18.54   0.1186  18.54   0.1186  Discoba;Jakobida
2.463   0.01576 2.463   0.01576 Discoba;Euglenozoa
0   0   10.86   0.0695  Rhodophyta
10.86   0.0695  10.86   0.0695  Rhodophyta;Eurhodophytina
0   0   32.69   0.2092  Haptista
32.69   0.2092  32.69   0.2092  Haptista;Haptophyta
0   0   0.6686  0.004278    Telonemia
0.6686  0.004278    0.6686  0.004278    Telonemia;Telonema
0   0   3.217   0.02058 Malawimonadida
3.217   0.02058 3.217   0.02058 Malawimonadida;Malawimonas
0   0   1.79    0.01145 Stramenopiles
1.79    0.01145 1.79    0.01145 Stramenopiles;Opalozoa
0   0   17.22   0.1101  Amoebozoa
13.57   0.08684 13.57   0.08684 Amoebozoa;Tubulinea
2.214   0.01417 2.214   0.01417 Amoebozoa;Evosea
1.428   0.009138    1.428   0.009138    Amoebozoa;Discosea
0   0   2.358   0.01509 Rhizaria
2.358   0.01509 2.358   0.01509 Rhizaria;Cercozoa
0   0   0.6985  0.004469    Metamonada
0.6985  0.004469    0.6985  0.004469    Metamonada;Preaxostyla
0   0   1.642   0.01051 Hemimastigophora
1.642   0.01051 1.642   0.01051 Hemimastigophora;Spironema
snacktavish commented 9 months ago

Ha! I just realized I needed to add "--per-query-results" to get what I wanted. But what do those other results show?

lczech commented 9 months ago

Hi Emily,

hope you are doing well!

That first table (notruncprofile.tsv in your post) shows the accumulated placements for each of your taxonomic groups, similar to what a Krona plot would give you. In fact, there also is an option to produce the Krona output format, so that you can visualize that table as such.

For instance, in our review, see Fig 4A and 4B: The table gives you the total placement mass (LWR) across your taxonomic groups. Note though that this is based on fractional placement masses: Each of your query sequences has a distribution of potential placement locations across the branches of the tree, summing to 1.0 for all branches for one query. So, a very certain placement of a sequence would have close to 1.0 LWR on one branch, and 0.0 everywhere else, while uncertainty would be expressed by having some placement mass (LWR) across multiple branches. Then, all that is added up per branch and per taxonomic clade (as defined by your taxonomy), and printed in that table.

Does that make sense? Your per query results would then give you the additional information of the taxonomic assignment per query, if that's what you need.

Cheers and so long Lucas

lczech commented 6 months ago

Hey @snacktavish,

is this still an issue, or shall we close it for now?

Cheers Lucas

snacktavish commented 6 months ago

That's great! Thanks.