labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
242 stars 28 forks source link

Clustering issue #276

Closed camilae86 closed 1 week ago

camilae86 commented 2 months ago

Hi! I have this error during clustering (ppanggolin cluster -p pangenome.h5):

raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['mmseqs', 'createdb', '/tmp/tmp84p8ohw5/nucleotid_sequences', '/tmp/tmp84p8ohw5/nucleotid_sequences_db']' returned non-zero exit status 1.

axbazin commented 2 months ago

Hi!

With which version of ppanggolin did you get this error?

Last time I saw this error, it was related to a full disk in the "/tmp" folder. This can happen if your pangenome is pretty big (multiple thousands of genomes) and/or if the "/tmp" disk space is very small (a few Gb).

If your problem is neither, could you share the complete log, and eventually the input that you used if that is possible for you?

I hope this helps! Adelme

camilae86 commented 2 months ago

Thanks @axbazin,

You were right, the size of the server is too small to carry out this project. So, we tried in an bigger Amazon server, but we face another problem. We were able to install the program in a conda environment with Python 3.8 (called milagro), however when we run the ppanggolin code annotate --anno ORGANISM_ANNOTATION_LIST --fasta ORGANISM_FASTA_LIST, we get the following error:

2024-09-02 23:57:15 utils.py:l168 INFO  Command: /home/ubuntu/miniconda3/envs/milagro/bin/ppanggolin annotate --anno anotacion_Burkholderia_2024_amazon_4.txt --fasta genomas_Burkholderia_2024_amazon_4.txt
2024-09-02 23:57:15 utils.py:l169 INFO  PPanGGOLiN version: 2.1.1
2024-09-02 23:57:15 annotate.py:l1047 INFO  Reading anotacion_Burkholderia_2024_amazon_4.txt the list of genome files ...
  0%|                                                                                                                    | 0/4 [00:00<?, ?file/s]2024-09-02 23:57:15 genome.py:l461 WARNING Contig length is unknown
 25%|███████████████████████████                                                                                 | 1/4 [00:00<00:00,  9.45file/s]
2024-09-02 23:57:16 genome.py:l461 WARNING  Contig length is unknown
2024-09-02 23:57:16 genome.py:l461 WARNING  Contig length is unknown
2024-09-02 23:57:16 genome.py:l461 WARNING  Contig length is unknown
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/milagro/lib/python3.8/site-packages/ppanggolin/annotate/annotate.py", line 969, in read_anno_file
    org, has_fasta = read_org_gff(organism_name, filename, circular_contigs, pseudo, translation_table)
  File "/home/ubuntu/miniconda3/envs/milagro/lib/python3.8/site-packages/ppanggolin/annotate/annotate.py", line 793, in read_org_gff
    correct_putative_overlaps(org.contigs)
  File "/home/ubuntu/miniconda3/envs/milagro/lib/python3.8/site-packages/ppanggolin/annotate/annotate.py", line 911, in correct_putative_overlaps
    if gene.stop > len(contig):
TypeError: 'NoneType' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/milagro/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/ubuntu/miniconda3/envs/milagro/lib/python3.8/site-packages/ppanggolin/annotate/annotate.py", line 971, in read_anno_file
    raise Exception(f"Reading the gff3 file '{filename}' raised an error. {err}")
Exception: Reading the gff3 file '/home/ubuntu/mariac_pangenomas/R_18628.gff' raised an error. 'NoneType' object cannot be interpreted as an integer
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/milagro/bin/ppanggolin", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/miniconda3/envs/milagro/lib/python3.8/site-packages/ppanggolin/main.py", line 177, in main
    ppanggolin.annotate.launch(args)
  File "/home/ubuntu/miniconda3/envs/milagro/lib/python3.8/site-packages/ppanggolin/annotate/annotate.py", line 1235, in launch
    read_annotations(pangenome, args.anno, cpu=args.cpu, pseudo=args.use_pseudo,
  File "/home/ubuntu/miniconda3/envs/milagro/lib/python3.8/site-packages/ppanggolin/annotate/annotate.py", line 1076, in read_annotations
    org, flag = future.result()
  File "/home/ubuntu/miniconda3/envs/milagro/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/home/ubuntu/miniconda3/envs/milagro/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
Exception: Reading the gff3 file '/home/ubuntu/mariac_pangenomas/R_18628.gff' raised an error.

Thanks for your help...

axbazin commented 2 months ago

Hi,

So, this is likely related to an unexpected formatting of one of your gff3 file (likely R_18628.gff).

If you wish you can add it to the issue and I can take a look, but overall we try to follow the specifications indicated here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

I see that there are warnings about contig length being unknown, so maybe it's related to the contig feature (or lack of thereof) in your gff3 file? Though, without an example I can only guess.

Adelme

axbazin commented 1 week ago

Hi,

I hope you managed to find a solution to your problem. Closing for now. If this issue is still a thing feel free to re-open it.

Adelme