jorvis / GALES

Genomic Annotation Logic and Execution System
MIT License
8 stars 6 forks source link

gff3 output file missing product attribute for RNA features #13

Closed nsuvarnaiari closed 5 years ago

nsuvarnaiari commented 5 years ago

Hi Josh,

The gff3 file resulting from GALES annotation is missing "product" attribute and it's qualifier for RNA features. You had mentioned that prediction titles from barnapp were not carried over to GFF3.

I would really appreciate if this could be fixed sooner. We have a paid user requesting GALES annotation for a genome.

Thanks, Suvvi

jorvis commented 5 years ago

I can work on this later today or over the weekend but, if timing is critical, the barrnap file in the output directory has the predictions and there shouldn't be so many that you couldn't manually enter the product values until the software overall supports it, right?

nsuvarnaiari commented 5 years ago

Yes, I was in fact thinking of doing it later today.

Thanks, Suvvi

jorvis commented 5 years ago

Suvvi - this was fixed in biocode commits 5026e6f10330af76b43d48446c7582fce9cf7176 and 0dd4646997f4087ccdedf7e074c9796f3cd97377

These were released as part of biocode 0.9.0. I'm now creating a new GALES docker release to contain this fix.

jorvis commented 5 years ago

GALES 0.3.0 released and docker build exported. I've also updated your GCP instance already so you should be good for a test run.

nsuvarnaiari commented 5 years ago

Thanks Josh. Got back from vacation.

Jain or I will give it a try and let you know if run into problems.

Thanks, Suvvi

nsuvarnaiari commented 5 years ago

Hi Josh,

@jaluvathingal ran a genome and found that output gff3 still has no product name for rRNAs, but there is product information for tRNAs.

Thanks, Suvvi

jorvis commented 5 years ago

Suvvi - Could you please gzip and attach that genome so I can test with it?

jaluvathingal commented 5 years ago

Hi Josh,

Attaching the gzipped fasta file of the genome I ran.

CNH_1200.fasta.gz

Thanks, Jain

jorvis commented 5 years ago

Can you search your file (or attach it) and see if I'm somehow getting something different? I just ran that genome through on a new install and got lines like these:

scf7180000000002|trim|quiver|pilon      .       gene    337355  337464  .       -       .       ID=rRNA_48c8a11c-0f52-4eca-a025-fe9c40dbb1c4_gene
scf7180000000002|trim|quiver|pilon      .       rRNA    337355  337464  .       -       .       ID=rRNA_48c8a11c-0f52-4eca-a025-fe9c40dbb1c4_rRNA;Parent=rRNA_48c8a11c-0f52-4eca-a025-fe9c40dbb1c4_gene;product_name=5S ribosomal RNA
scf7180000000002|trim|quiver|pilon      .       gene    337613  340495  .       -       .       ID=rRNA_8f4fb22f-d2be-474e-8b04-31692b182abf_gene
scf7180000000002|trim|quiver|pilon      .       rRNA    337613  340495  .       -       .       ID=rRNA_8f4fb22f-d2be-474e-8b04-31692b182abf_rRNA;Parent=rRNA_8f4fb22f-d2be-474e-8b04-31692b182abf_gene;product_name=23S ribosomal RNA
scf7180000000002|trim|quiver|pilon      .       gene    340723  340798  .       -       .       ID=tRNA_8fc55862-e867-4b0f-836d-df7014ea3441_gene
scf7180000000002|trim|quiver|pilon      .       tRNA    340723  340798  .       -       .       ID=tRNA_8fc55862-e867-4b0f-836d-df7014ea3441_tRNA;Parent=tRNA_8fc55862-e867-4b0f-836d-df7014ea3441_gene;product_name=tRNA-Ala
scf7180000000002|trim|quiver|pilon      .       gene    340827  340903  .       -       .       ID=tRNA_42a806de-992f-465e-a412-a09d53932914_gene
scf7180000000002|trim|quiver|pilon      .       tRNA    340827  340903  .       -       .       ID=tRNA_42a806de-992f-465e-a412-a09d53932914_tRNA;Parent=tRNA_42a806de-992f-465e-a412-a09d53932914_gene;product_name=tRNA-Ile
scf7180000000002|trim|quiver|pilon      .       gene    340969  342500  .       -       .       ID=rRNA_5ead6a9b-4a1b-4d2e-86b6-8a80c3322c12_gene
scf7180000000002|trim|quiver|pilon      .       rRNA    340969  342500  .       -       .       ID=rRNA_5ead6a9b-4a1b-4d2e-86b6-8a80c3322c12_rRNA;Parent=rRNA_5ead6a9b-4a1b-4d2e-86b6-8a80c3322c12_gene;product_name=16S ribosomal RNA

This shows product names for both tRNA and rRNA features. (scroll to the right to see them at the end of the ncRNA rows)

nsuvarnaiari commented 5 years ago

Hi Josh,

The rRNA and tRNA genes are predicted in the pipeline and we can see them with their annotations in barnapp.gff and aragorn.out output files. But, the rRNA features in attributor.annotation.gff3 output file do not have the "product" info. You can see the files in GCP instance, /home/nsuvarnaiari/PFDA1/batch7/gales_output/CNH_1200

I am still confused how you are able to get "product" for rRNAs. I did do a git pull before starting the pipeline.

nsuvarnaiari commented 5 years ago

Hi Josh,

Checking if you had a chance to look at the files that I mentioned in my previous post? Since you are going to be here next week, could we meet to go over the issues, or perhaps run the genome together in GCP? Let me know which day/time is best for you. We are in a time crunch for this project where we have to finish annotations soon. @mgiglio99 , @jaluvathingal

Thanks, Suvvi

jorvis commented 5 years ago

Suvvi - I built a completely new instance for you to check out, and tested that genome on it. If you look in GCP you'll see dacc-refgenomes-2, and the test genome is in this folder:

/home/jorvis/genomes/CNH_1200

Try running there, and if you confirm that you don't have any files you need to keep on dacc-refgenomes-1 I'll remove it.

jaluvathingal commented 5 years ago

Hi Josh,

I'm running the genomes through GALES in dacc-refgenomes-2 . The ones that completed running have product names for tRNAs and rRNAs in their gff3 files.

I have copied over the necessary files from dacc-refgenomes-1 to dacc-refgenomes-2. So you can go ahead and remove dacc-refgenomes-1 instance.

Thanks, Jain

jorvis commented 5 years ago

Thanks. Closing this ticket then.