labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
242 stars 30 forks source link

Written error core families sequences #179

Closed clelauden closed 9 months ago

clelauden commented 9 months ago

Hello! I tried to write the fasta file for gene family representative of each partitions with ppanggolin fasta.

It works fine with shell, cloud and persistent but it doesn't with core, softcore and rgp.

I've used

ppanggolin fasta --prot_families core -p pangenome.h5 -o gene_fam_core

I get this error :

ppanggolin fasta --prot_families core -p pangenome.h5 -o gene_fam_core --verbose 2
2024-02-15 14:46:28 utils.py:l167 INFO  Command: /home/clauden/anaconda3/envs/ppanggolin/bin/ppanggolin fasta --prot_families core -p pangenome.h5 -o gene_fam_core --verbose 2
2024-02-15 14:46:28 utils.py:l168 INFO  PPanGGOLiN version: 2.0.2
2024-02-15 14:46:28 utils.py:l528 DEBUG The parameter "--output: gene_fam_core" has been specified in the command line with a non-default value. Its value overwrites the default value (None).
2024-02-15 14:46:28 utils.py:l528 DEBUG The parameter "--pangenome: pangenome.h5" has been specified in the command line with a non-default value. Its value overwrites the default value (None).
2024-02-15 14:46:28 utils.py:l528 DEBUG The parameter "--prot_families: core" has been specified in the command line with a non-default value. Its value overwrites the default value (None).
2024-02-15 14:46:28 utils.py:l528 DEBUG The parameter "--verbose: 2" has been specified in the command line with a non-default value. Its value overwrites the default value (1).
2024-02-15 14:46:28 utils.py:l667 DEBUG 3 fasta parameters have non-default value: output=gene_fam_core, prot_families=core, verbose=2
2024-02-15 14:46:28 utils.py:l721 INFO  3 parameters have a non-default value.
2024-02-15 14:46:28 utils.py:l263 DEBUG Create output directory /home/clauden/ppanggolin/ppanggolin_output_thermo_comp/gene_fam_core
2024-02-15 14:46:28 readBinaries.py:l94 INFO    Getting the current pangenome status
2024-02-15 14:46:28 readBinaries.py:l730 INFO   Reading pangenome gene families...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 247004/247004 [00:01<00:00, 167208.10gene family/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20540/20540 [00:00<00:00, 127854.06gene family/s]
2024-02-15 14:46:29 writeSequences.py:l114 INFO Writing the representative representative amino acid sequences of the gene families of the core gene families...
Traceback (most recent call last):
  File "/home/clauden/anaconda3/envs/ppanggolin/bin/ppanggolin", line 10, in <module>
    sys.exit(main())
  File "/home/clauden/anaconda3/envs/ppanggolin/lib/python3.10/site-packages/ppanggolin/main.py", line 195, in main
    ppanggolin.formats.writeSequences.launch(args)
  File "/home/clauden/anaconda3/envs/ppanggolin/lib/python3.10/site-packages/ppanggolin/formats/writeSequences.py", line 438, in launch
    write_sequence_files(pangenome, args.output, fasta=args.fasta, anno=args.anno, soft_core=args.soft_core,
  File "/home/clauden/anaconda3/envs/ppanggolin/lib/python3.10/site-packages/ppanggolin/formats/writeSequences.py", line 417, in write_sequence_files
    write_fasta_prot_fam(pangenome, output, prot_families, soft_core, compress, disable_bar)
  File "/home/clauden/anaconda3/envs/ppanggolin/lib/python3.10/site-packages/ppanggolin/formats/writeSequences.py", line 173, in write_fasta_prot_fam
    genefams = select_families(pangenome, prot_families, "representative amino acid sequences of the gene families",
  File "/home/clauden/anaconda3/envs/ppanggolin/lib/python3.10/site-packages/ppanggolin/formats/writeSequences.py", line 117, in select_families
    if fam.number_of_organisms == pangenome.number_of_organisms:
  File "/home/clauden/anaconda3/envs/ppanggolin/lib/python3.10/site-packages/ppanggolin/geneFamily.py", line 303, in number_of_organisms
    _ = self.get_org_dict()
  File "/home/clauden/anaconda3/envs/ppanggolin/lib/python3.10/site-packages/ppanggolin/geneFamily.py", line 394, in get_org_dict
    raise AttributeError(f"Gene: {gene.name} is not fill with organism")
AttributeError: Gene: None is not fill with organism. Did you mean: 'number_of_genes'?

Thanks in advance! ;)

JeanMainguy commented 9 months ago

Hello, Thank you for bringing this bug to our attention. I have addressed the issue in Pull Request #180, and the fix will be included in the next release very soon. Best regards