Closed amorris28 closed 3 years ago
Looks great to me :) @amorris28, would you also consider posting two files (PROKKA.gff and IMG.gff) and your commands to run on these files to test this script?
Here are two small GFFs from IMG and Prokka: gff_files.zip
Command for PROKKA.gff
:
python gff_parser.py PROKKA.gff --gene-calls Prokka_gene_calls.txt --annotation Prokka_annotation.txt --process-all --source Prokka
Command for IMG.gff
:
python gff_parser.py IMG.gff --gene-calls IMG_gene_calls.txt --annotation IMG_annotation.txt --process-all --source IMG
Thank you both, @amorris28 & @meren!
Now the test files are also there with the examples. In addition, since it's 2021, the default branch has been renamed to main.
You may need to run this locally:
git branch -m master main
git fetch origin
git branch -u origin/main main
git remote set-head origin -a
Dear @amorris28 and @karkman, I see in the tests directory GFF files for IMG and Prokka, but I guess we are missing FASTA files there associated with these files?
Would it be possible to include them there as well?
Yes, if it is OK also for @amorris28, then you could make a new pull request with the FASTA files in the test folder and I'll add them to the main branch.
I'll put it on my to do list to put together a set of example .gff
and .fasta
files for JGI/IMG importing. Unfortunately, they don't use the same contig IDs for their annotations and their contigs (:exploding_head:) so it'll take a little bit of work on my end.
FYI I have abandoned trying to use IMG's assemblies and annotations. Their data are too difficult to work with and I never successfully imported them to Anvi'o. It's much easier to simply download the raw reads from JGI, redo the assembly, import them to Anvi'o, and do the annotations there. Maybe if someone down the line wants to try to use data from JGI's pipeline and successfully gets it imported they could update this?
I am sorry to hear that @amorris28 :( Then we will wait until some progress is made on IMG side or our side. If you have not been able to successfully utilize their products, there is no need to waste more of your time on this.
Thank you very much for your help so far.
Thank you, @amorris28! I made a new branch for possible IMG parser development. Should the main branch stay as it is, or go back to the previous, "prokka-only", version?
The current main branch does work to convert JGI's functional_annotations.gff
into the Anvi'o-formatted gene_calls.txt
and gene_annot.txt
, which is the main purview for this tool so I think it's okay to leave it as it is. It's just the next step of correctly meshing those .txt
s with JGI's contigs.fa
that users will have to figure out before passing them to anvi-gen-contigs-database
.
Added a
--source
flag that takes eitherIMG
orProkka
as input (default:Prokka
). Anything else kills the program with a warning message. This changes the separator when parsing thesource, version
column of the.gff
file.