labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
240 stars 28 forks source link

rarefaction curve : Population must be a sequence #253

Closed emorinrae closed 2 months ago

emorinrae commented 2 months ago

Hi,

I'm trying your tool for the first time, everything goes well when I run : ppanggolin all --fasta genomes_list.txt --cpu 40

but when I try to draw the rarefaction curve I encounter an error : ppanggolin rarefaction -p pangenome.h5

2024-07-24 11:30:26 utils.py:l169 INFO ppanggolin rarefaction -p pangenome.h5 2024-07-24 11:30:26 utils.py:l170 INFO PPanGGOLiN version: 2.1.0 2024-07-24 11:30:26 readBinaries.py:l100 INFO Getting the current pangenome status 2024-07-24 11:30:26 readBinaries.py:l786 INFO Reading pangenome annotations... 100%|█████████████████████████████████████████████████| 42/42 [00:00<00:00, 105107.86genome/s] 100%|█████████████████████████████████████████████| 2043/2043 [00:00<00:00, 149417.83contig/s] 100%|███████████████████████████████████████████| 206200/206200 [00:01<00:00, 121987.15gene/s] 100%|███████████████████████████████████████████████| 2449/2449 [00:00<00:00, 226481.69gene/s] 2024-07-24 11:30:29 readBinaries.py:l801 INFO Reading pangenome gene families... 100%|████████████████████████████████████| 206200/206200 [00:00<00:00, 301257.74gene family/s] 100%|███████████████████████████████████████| 26302/26302 [00:00<00:00, 43525.65gene family/s] 2024-07-24 11:30:30 readBinaries.py:l810 INFO Reading the neighbors graph edges... 100%|███████████████████████████████| 203650/203650 [00:01<00:00, 199278.59contig adjacency/s] 2024-07-24 11:30:31 rarefaction.py:l382 INFO Reuse the number of partitions 3 2024-07-24 11:30:31 rarefaction.py:l390 INFO Extracting samples ... Traceback (most recent call last): File "/cm/shared/apps/ppanggolin/2.1.0/bin/ppanggolin", line 10, in sys.exit(main()) ^^^^^^ File "/cm/shared/apps/ppanggolin/2.1.0/lib/python3.12/site-packages/ppanggolin/main.py", line 188, in main ppanggolin.nem.rarefaction.launch(args) File "/cm/shared/apps/ppanggolin/2.1.0/lib/python3.12/site-packages/ppanggolin/nem/rarefaction.py", line 466, in launch make_rarefaction_curve(pangenome=pangenome, output=args.output, tmpdir=args.tmpdir, beta=args.beta, File "/cm/shared/apps/ppanggolin/2.1.0/lib/python3.12/site-packages/ppanggolin/nem/rarefaction.py", line 394, in make_rarefaction_curve all_samples.append(set(random.sample(set(pangenome.organisms), i + 1))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/cm/shared/apps/ppanggolin/2.1.0/lib/python3.12/random.py", line 413, in sample raise TypeError("Population must be a sequence. " TypeError: Population must be a sequence. For dicts or sets, use sorted(d).

Any ideas why I have this error ? Thanks,

Emmanuelle

JeanMainguy commented 2 months ago

Hi,

Thanks for reporting this bug.

It looks like the issue is due to Python 3.12, which isn't fully supported or tested with PPanGGOLiN right now. Since Python 3.11, the sample function in the random module doesn't automatically convert sets to lists anymore: https://docs.python.org/3/library/random.html#random.sample

To fix this issue, try using PPanGGOLiN with Python 3.10. You might need to reinstall the tool in a fresh environment with Python 3.10.

Here’s how you can do it with conda:

conda create -n ppanggolin_py3.10 python=3.10 ppanggolin=2.1.0
conda activate ppanggolin_py3.10

Currently, PPanGGOLiN is tested with Python versions 3.8 to 3.10. It would definitely be great to support the latest Python versions in the future.

Best

emorinrae commented 2 months ago

Hi,

It works, thanks.

Emmanuelle

JeanMainguy commented 2 months ago

Ok nice!

I made the fix to make ppanggolin support up to python 3.12. So it should work fine with python 3.12 in the next release !

Regards