Closed LemoAlex closed 1 year ago
Hello, Indeed it seems quite surprising. Which version are you using ? I will try to replicate the error with the version you used.
Adelme
Hello,
thanks for the prompt answer.
I am using ppanggolin 1.2.74 , installed through conda on a virtual environment. I am running it on a macbook with the M1 chip, in case it is relevant (it has been an issue with some other programs..)
Thanks,
Alexandre
Hello,
It tunred out I had a duplicate chromosome name in the fasta information input (and in one .fasta file). After renaming, the issue disappeared.
Sorry for the trouble !
Best,
Alexandre
Hi,
What do you mean by "duplicate chromosome name in .fasta" ? A genome was there twice with the same fasta file indicated ? Or there was a contig with the same identifier in 2 different fasta files ?
I feel like it's something that ppanggolin should be able to tell since there is some amount on "input verification" at the beginning. Depending on which case you meant I'll see if I can add some warnings in the code if that happens.
Adelme
Hi,
Sorry my explanation was indeed not very precise.
in my input --fasta file:
Genome1 path/to/file.fasta ChromA ChromB Genome2 path/to/file.fasta ChromC Chrom D Genome3 path/to/file.fasta ChromE ChromE
So, here the Genome3 chromosomes are duplicated. And in the "real" fasta file of "Genome3", the input was actually:
ChromE ACGT.... ChromE ACGT....
So the chromosome were duplicated in my original input.
I hope this is a bit cleared now !
Best,
Alexandre
Hi,
Alright thank you for the detailed explanation ! I see what might have happened. I'll check to add some warnings for those cases when we'll be preparing a new release.
Adelme
Closing as this is likely an edge case and should not happen in general.
If other people meet this problem, please don't hesitate to comment and we might consider working on adding some warnings :)
Hello ppanggolin users,
I am trying to run a panrgp analysis on 40 bacterial genomes, however after the command :
ppanggolin panrgp --fasta fasta.input -o output_panrgp --cpu 12
everything goes well as : [...]
cluster.py:l212 INFO Clustering all of the genes sequences... 2022-11-11 16:46:19 cluster.py:l48 INFO Creating sequence database... 2022-11-11 16:46:20 cluster.py:l58 INFO Clustering sequences... 2022-11-11 16:46:29 cluster.py:l60 INFO Extracting cluster representatives... 2022-11-11 16:46:30 cluster.py:l72 INFO Writing gene to family informations 2022-11-11 16:46:30 cluster.py:l220 INFO Associating fragments to their original gene family... 2022-11-11 16:46:30 cluster.py:l33 INFO Aligning cluster representatives... 2022-11-11 16:46:42 cluster.py:l38 INFO Extracting alignments... 2022-11-11 16:46:43 cluster.py:l104 INFO Starting with 12061 families 2022-11-11 16:46:43 cluster.py:l135 INFO Ending with 9651 gene families 2022-11-11 16:46:43 cluster.py:l163 INFO Adding protein sequences to the gene families 2022-11-11 16:46:43 cluster.py:l140 INFO Adding 132647 genes to the gene families Traceback (most recent call last):
and then, I get an error :
Traceback (most recent call last): File "/Users/opt/anaconda3/envs/Ppangolin-env/bin/ppanggolin", line 8, in
sys.exit(main())
File "/Users/opt/anaconda3/envs/Ppangolin-env/lib/python3.8/site-packages/ppanggolin/main.py", line 247, in main
ppanggolin.workflow.panRGP.launch(args)
File "/Users/opt/anaconda3/envs/Ppangolin-env/lib/python3.8/site-packages/ppanggolin/workflow/panRGP.py", line 61, in launch
clustering(pangenome, args.tmpdir, args.cpu, defrag=not args.no_defrag, disable_bar=args.disable_prog_bar)
File "/Users/opt/anaconda3/envs/Ppangolin-env/lib/python3.8/site-packages/ppanggolin/cluster/cluster.py", line 226, in clustering
read_gene2fam(pangenome, genes2fam, disable_bar=disable_bar)
File "/Users/opt/anaconda3/envs/Ppangolin-env/lib/python3.8/site-packages/ppanggolin/cluster/cluster.py", line 145, in read_gene2fam
raise Exception("Something unexpected happened during clustering "
Exception: Something unexpected happened during clustering (have less genes clustered than genes in the pangenome). A probable reason is that two genes in two different organisms have the same IDs; If you are sure that all of your genes have non identical IDs, please post an issue at https://github.com/labgem/PPanGGOLiN/
I am not providing any annotation file , just the fasta sequences, so this error is a bit surprising to me. What could be the reason for this ?
Thanks a lot for your help & time,
Best, Alexandre