karkman / gff_parser

Parser to add external gene calls and functional annotation from Prokka to Anvi'o.
GNU General Public License v3.0
1 stars 5 forks source link

Add support for IMG-formatted gff files through the --source flag #9

Closed amorris28 closed 3 years ago

amorris28 commented 3 years ago

Added a --source flag that takes either IMG or Prokka as input (default: Prokka). Anything else kills the program with a warning message. This changes the separator when parsing the source, version column of the .gff file.

meren commented 3 years ago

Looks great to me :) @amorris28, would you also consider posting two files (PROKKA.gff and IMG.gff) and your commands to run on these files to test this script?

amorris28 commented 3 years ago

Here are two small GFFs from IMG and Prokka: gff_files.zip

Command for PROKKA.gff:

python gff_parser.py PROKKA.gff --gene-calls Prokka_gene_calls.txt --annotation Prokka_annotation.txt --process-all --source Prokka

Command for IMG.gff:

python gff_parser.py IMG.gff --gene-calls IMG_gene_calls.txt --annotation IMG_annotation.txt --process-all --source IMG
karkman commented 3 years ago

Thank you both, @amorris28 & @meren!

Now the test files are also there with the examples. In addition, since it's 2021, the default branch has been renamed to main.

You may need to run this locally:

git branch -m master main
git fetch origin
git branch -u origin/main main
git remote set-head origin -a
meren commented 3 years ago

Dear @amorris28 and @karkman, I see in the tests directory GFF files for IMG and Prokka, but I guess we are missing FASTA files there associated with these files?

Would it be possible to include them there as well?

karkman commented 3 years ago

Yes, if it is OK also for @amorris28, then you could make a new pull request with the FASTA files in the test folder and I'll add them to the main branch.

amorris28 commented 3 years ago

I'll put it on my to do list to put together a set of example .gff and .fasta files for JGI/IMG importing. Unfortunately, they don't use the same contig IDs for their annotations and their contigs (:exploding_head:) so it'll take a little bit of work on my end.

amorris28 commented 3 years ago

FYI I have abandoned trying to use IMG's assemblies and annotations. Their data are too difficult to work with and I never successfully imported them to Anvi'o. It's much easier to simply download the raw reads from JGI, redo the assembly, import them to Anvi'o, and do the annotations there. Maybe if someone down the line wants to try to use data from JGI's pipeline and successfully gets it imported they could update this?

meren commented 3 years ago

I am sorry to hear that @amorris28 :( Then we will wait until some progress is made on IMG side or our side. If you have not been able to successfully utilize their products, there is no need to waste more of your time on this.

Thank you very much for your help so far.

karkman commented 3 years ago

Thank you, @amorris28! I made a new branch for possible IMG parser development. Should the main branch stay as it is, or go back to the previous, "prokka-only", version?

amorris28 commented 3 years ago

The current main branch does work to convert JGI's functional_annotations.gff into the Anvi'o-formatted gene_calls.txt and gene_annot.txt, which is the main purview for this tool so I think it's okay to leave it as it is. It's just the next step of correctly meshing those .txts with JGI's contigs.fa that users will have to figure out before passing them to anvi-gen-contigs-database.