hoelzer-lab / ribap

A comprehensive bacterial core gene-set annotation pipeline based on Roary and pairwise ILPs
GNU General Public License v3.0
25 stars 4 forks source link

Improve RIBAP output for core/shell/cloud with varying cutoffs #60

Open hoelzer opened 4 months ago

hoelzer commented 4 months ago

RIBAP, in its current implementation, is also very strict about categorizing genes into the core genome, namely those present in all input genomes. Given input data of even higher diversity than in the present study, this conservative threshold could be lowered to, e.g., 95%, which is a generally accepted threshold in other studies as well (called soft core) (37,38). However, please note that RIBAP calculates and reports all possible RIBAP groups by refining the initial Roary clusters. Thus, the final output will have all RIBAP groups that comprise 100% of the input genomes or less.

The user can filter this table to, e.g., select all RIBAP groups that comprise at least 99% or 90% of input samples in case that better fits the actual application. What we would like to add in the future is automatic filtering, summary, and reporting of varying cutoffs, such as 100/99/95/90/80/70/60 %, to provide the user directly with different sets of RIBAP groups that can then be used for downstream analysis. Currently, the user gets all this information but has to perform the filtering manually.