schneebergerlab / plotsr

Tool to plot synteny and structural rearrangements between genomes
MIT License
288 stars 28 forks source link

Adding gene annotation without mRNA in gff #29

Closed sandain closed 2 years ago

sandain commented 2 years ago

I'm trying to add gene annotation to my plot, and I get the following error about missing mRNA:

$ plotsr --sr "UW224.syri.out" --genomes "UW224.manifest" --tracks HG316453.2.tracks -o "UW224.pdf" -H 3 -W 5
track - WARNING - Reading GFF file HG316453.2.gff. Overlapping transcripts would be plotted as such without any filtering.
track - ERROR - In GFF: HG316453.2.gff at line ['HG316453.2', 'EMBL', 'CDS', '202', '1557', '.', '+', '0', 'ID=cds-CDJ54313.1;Parent=gene-SPYH293_00001;Dbxref=NCBI_GP:CDJ54313.1;Name=CDJ54313.1;gbkey=CDS;gene=dnaA;inference=similar', 'to', 'AA', 'sequence:UniProtKB:P05648;locus_tag=SPYH293_00001;product=Chromosomal', 'replication', 'initiator', 'protein', 'DnaA;protein_id=CDJ54313.1;transl_table=11'], CDS feature is before the mRNA feature. Expected mRNA before CDS. Exiting.

Here is the tracks file:

# file  name    tags
HG316453.2.gff  Genes   ft:gff;bw:10000;nc:black;ns:8;nf:DejaVu Sans;lc:blue;lw:4;bc:lightblue;ba:0.5

HG316453.2.gff was downloaded from Genbank and does not include mRNA: https://eutils.ncbi.nlm.nih.gov/sviewer/viewer.cgi?retmode=text&id=HG316453.2&db=nucleotide&report=gff3

Is it possible to have plotsr add gene annotation without mRNA, or do I need to find a different source for my annotation? Note that the plot works just fine if I don't try to include the annotation track.

Thank you.

sandain commented 2 years ago

I got it to work with the following changes:

--- a/func.py   2022-10-07 18:43:07.732793417 -0500
+++ b/func.py   2022-10-07 18:40:47.295190378 -0500
@@ -645,7 +645,7 @@
         from collections import defaultdict, deque
         import sys
         annos = defaultdict(dict)
-        acceptlist = {'mrna', 'cds'}
+        acceptlist = {'gene', 'cds'}
         featwarn = True
         skipchr = []
         chrs = set(chrlengths[0][1].keys())
@@ -654,6 +654,8 @@
             model = None
             for line in fin:
                 line = line.strip().split()
+                if not line:
+                    continue
                 if line[0] not in chrs:
                     if line[0] not in skipchr:
                         skipchr.append(line[0])
@@ -672,7 +674,7 @@
                         sys.exit()

                     if model is not None:
-                        if t == 'mrna':
+                        if t == 'gene':
                             k = list(model.keys())[0]
                             annos[k[0]][(k[1], k[2])] = model.values()
                             model = defaultdict()
@@ -682,7 +684,7 @@
                             model[cse].append((int(line[3]), int(line[4])))
                     elif model is None:
                         if t == 'cds':
-                            self.logger.error("In GFF: {} at line {}, CDS feature is before the mRNA feature. Expected mRNA before CDS. Exiting.".format(self.f, line))
+                            self.logger.error("In GFF: {} at line {}, CDS feature is before the gene feature. Expected gene before CDS. Exiting.".format(self.f, line))
                             sys.exit()
                         else:
                             model = defaultdict()

I don't know if swapping mRNA for gene is appropriate in all cases, but it works for me.

There is a second fix in that patch -- ignore blank lines.

mnshgl0110 commented 2 years ago

Hi Jason, Thanks for checking this out. Indeed, this could work for cases when the GFF file do not have the mRNA field. Alternatively, it is also possible to edit the gff file to substitute gene to 'mRNA`. That might be easier as well.