labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
240 stars 28 forks source link

ValueError: max() iterable argument is empty #264

Closed frdel1 closed 1 month ago

frdel1 commented 1 month ago

Hi, I am experiencing an error: ValueError: max() iterable argument is empty. Here is the complete log file.

consol.err.txt

Best,

jpjarnoux commented 1 month ago

Hi, The problem is that you didn't find any spot in your pangenome and not in your genomes to project on. And it's explained here:

2024-08-14 16:54:10 projection.py:l711 INFO 1 RGPs have been predicted in the input genomes.
2024-08-14 16:54:10 projection.py:l1299 INFO    Predicting spot of insertion in input genomes.
2024-08-14 16:54:11 spot.py:l122 INFO   1471 RGPs were not used as they are on a contig border (or have less than 3 persistent gene families until the contig border)
2024-08-14 16:54:11 spot.py:l124 INFO   0 RGPs are being used to predict spots of insertion
2024-08-14 16:54:11 spot.py:l126 INFO   0 number of different pairs of flanking gene families

It should not happen; we should pass the step if no spots are found. We will fix this but need your data (genomes and pangenome) to reproduce the issue. I can send you a link to a secure repository where you can share everything.

Thanks

JeanMainguy commented 1 month ago

Hello, I think the dataset is mentioned on top of the logs. You have created the pangenome with the 75 genomes listed in the log file and projected the sequence NZ_CP048437 from assembly GCF_010509575 right?

JeanMainguy commented 1 month ago

Hi,

I was able to replicate the error and fixed it in #266. Thanks for pointing that out!

The issue popped up because the pangenome doesn't have any spots. It turns out that your pangenome doesn’t have any persistent families, which is why no spots were detected. This happened because some of the GCA genomes used in the pangenome don’t have any CDS annotations (29 out of 75). You can see this clearly in the tile plot for example.

Working with GenBank annotations in ppanggolin can be a bit tricky since the annotations can be pretty inconsistent across different genomes, which affects ppanggolin's predictions. One way to avoid this is to use the fasta sequence of genomes and let ppanggolin handle the annotations, or you could stick to using RefSeq genomes when building the pangenome.

frdel1 commented 1 month ago

Hi, Sorry I was out of touch for a few days. Yes, the genomes are listed at the top of the log file, I am glad you were able to replicate the issue. Thank you for the work around and the fix :) Best wishes,

JeanMainguy commented 1 month ago

Hi, The bug fix is now included in version 2.1.1 . Best