labgem / PPanGGOLiN

Build a partitioned pangenome graph from microbial genomes
https://ppanggolin.readthedocs.io
Other
239 stars 28 forks source link

product_string #222

Closed oclaisse closed 5 months ago

oclaisse commented 5 months ago

I want to use ppanggolin on bakta annotated gbff files and Ihave this issue could you please help to solve best regards Olivier File "/usr/local/genome/Anaconda3/envs/ppanggolin-2.0.4/bin/ppanggolin", line 10, in sys.exit(main()) File "/usr/local/genome/Anaconda3/envs/ppanggolin-2.0.4/lib/python3.10/site-packages/ppanggolin/main.py", line 219, in main ppanggolin.workflow.all.launch(args) File "/usr/local/genome/Anaconda3/envs/ppanggolin-2.0.4/lib/python3.10/site-packages/ppanggolin/workflow/all.py", line 288, in launch launch_workflow(args, panrgp=True, panmodule=True) File "/usr/local/genome/Anaconda3/envs/ppanggolin-2.0.4/lib/python3.10/site-packages/ppanggolin/workflow/all.py", line 61, in launch_workflow write_pangenome(pangenome, filename, args.force, disable_bar=args.disable_prog_bar) File "/usr/local/genome/Anaconda3/envs/ppanggolin-2.0.4/lib/python3.10/site-packages/ppanggolin/formats/writeBinaries.py", line 711, in write_pangenome write_annotations(pangenome, h5f, disable_bar=disable_bar) File "/usr/local/genome/Anaconda3/envs/ppanggolin-2.0.4/lib/python3.10/site-packages/ppanggolin/formats/writeAnnotations.py", line 342, in write_annotations write_genedata(pangenome, h5f, annotation, genedata2gene, disable_bar) File "/usr/local/genome/Anaconda3/envs/ppanggolin-2.0.4/lib/python3.10/site-packages/ppanggolin/formats/writeAnnotations.py", line 310, in write_genedata genedata_row["product"] = genedata.product File "tables/tableextension.pyx", line 1681, in tables.tableextension.Row.setitem TypeError: invalid type (<class 'str'>) for column product /usr/local/genome/Anaconda3/envs/ppanggolin-2.0.4/lib/python3.10/site-packages/tables/file.py:113: UnclosedFileWarning

axbazin commented 5 months ago

Hi Olivier,

Indeed this is a known issue that we have had for some time now, issues #95 and #175 refer to this was well. This is due to COG. Sadly, my comment is still valid: I'm afraid, until we do find a proper workaround or the COG people fix this, your only simple way out of this one is to edit and remove the problematic characters from the gff or gbff files that you want to use.

You can find the problematic characters using this command on your gff or gbff files: LC_ALL=C grep -n -P [$'\x80'-$'\xFF'] *.g*ff

And then use sed to edit them to something meaningful with ASCII symbols. In recent issues, the problems were with :

Have a nice day, Adelme

oclaisse commented 5 months ago

Hi Adelme, Thank you very much for your quick answer. That was the issue, and really the right genes that caused it. I was perfectly able to fix it in my files with your indications and restart my analyses. Sincerely, Olivier