labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
242 stars 29 forks source link

ValueError: The gene family has not beed associated to a partition. #262

Closed frdel1 closed 2 months ago

frdel1 commented 3 months ago

Hi, I am experiencing the following error with ppanggolin all: "ValueError: The gene family has not beed associated to a partition."

Steps to reproduce:

# get a bunch of genomes to create the pangenome
datasets download genome accession GCF_009935005.1 GCF_001015835.1 GCF_009933955.1 GCF_001932715.1 GCF_027863375.1 --include gbff

# create the organism.gbff.list file
# create the pangenome with ppanggolin all
conda activate ppanggolin-2.1.0
ppanggolin all --anno /path/to/organism.gbff.list --cpu 1 --identity 0.8 --output /path/to/output_ppanggolin_all

Best wishes

jpjarnoux commented 3 months ago

Hi !

Sorry to hear about that. Could you launch your command again with the option --verbose 2 and share the results ?

Thanks

frdel1 commented 3 months ago

consol.out.txt

Sure, here it is. Are you able to reproduce the bug by downloading the set of genomes specified in the datasets download genome accession command and running ppanggolin all ?

jpjarnoux commented 3 months ago

Hi!

Thanks for the output. As I suspected, you don't have enough genomes in your pangenome. The partitioning method is based on the NEM algorithm, and to work with the default parameters, we suggest using at least 15 genomes. You can find more information about the PPanGGOLiN method in the publication here.

Yet all is not lost. First, add the -K 2 option to the' all' command. This option will force PPanGGOLiN to compute only two partitions. Then, If it did not work, I could suggest following the step-by-step pangenome construction in the documentation (skip the workflow part), or if you kept your pangenome, you could directly use the command explained here to custom the partitioning. @ggautreau will be a greater help than me at this stage.

frdel1 commented 3 months ago

Hi! Thanks for the explanation and the tips, I will follow your advice and use at least 15 genomes then.

jpjarnoux commented 3 months ago

Another tip, if you don't mind me saying so. You can build a pangenome with all genomes of your species from RefSeq or GenBank, for example, and project the pangenome on your five genomes of interest as explained here.

frdel1 commented 3 months ago

Thanks ! I have tried ppanggolin projection already, good stuff :)

JeanMainguy commented 2 months ago

Hi, We've changed the log to show a warning instead of a debug message when the partition step fails, making it easier to spot the problem in the version 2.1.1. Ideally, PPanGGOLiN should still work even if partitioning fails, as mentioned in issue #270.