indices and leaflet_var params

thomasstjerne commented 2 years ago

I have a couple of questions

Is it correct that omitting the --indices param will make the pipeline default to all indices?
If so, and the pipeline is started with only a few indices, like --indices calc_richness,calc_phylo_rpd1 is it correct that the user then also needs to adjust the --leaflet_var param, otherwise some input data will be missing for that step?

If this is correctly understood, is it possible to get a mapping of --leaflet_var values to which indices they depend on?

Best, Thomas

vmikk commented 2 years ago

Hello Thomas!

That is correct, --leaflet_var depends on --indices. Essentially, indices must be chosen at the step when we estimate diversity with Biodiverse. Biodiverse supports ~380 indices. By default, we estimate the most essential ones (e.g., species richness, PD, phylogenetic endemism). At the visualization step, we can show only the indices present in the resulting table. In addition, for some of the indices, there could be standardized effect sizes (SES, a.k.a. Z-scores). Biodiverse provides them using the same name but in a different table. To visualize them, we must add SES_ prefix to the index name (e.g., SES_PD). Currently, there is no "compatibility" validation of index names specified with these two parameters (missing indices just would not be displayed).

However, we can create a mapping file. Is there a preferred format for it?

With kind regards, Vladimir

PS. In Biodiverse, indices are organized into modules (subroutines). So specifying --indices "calc_pd" will provide estimates for 4 indices (PD, PD_P, PD_P_per_taxon, PD_per_taxon) plus their effect sizes (SES_*) which could be visualized.

vmikk commented 1 year ago

Schema file describing the parameters and possible options for some of those: https://github.com/vmikk/PhyloNext/blob/main/nextflow_schema.json (still a work-in-progress)

vmikk / PhyloNext

indices and leaflet_var params #4