seb-mueller / chlamy_locus_map

Small RNA Locus Map for Chlamydomonas reinhardtii
GNU General Public License v3.0
1 stars 0 forks source link

Annotations #4

Closed nmatthews323 closed 5 years ago

nmatthews323 commented 6 years ago

Just to update on this:

-I had a look at phytozome, it looks like most of the annotations haven't changed. -I've written a script which calculates introns from the phytozome, and uploaded the resulting GFF3 file. -Need to look at whether transposon and methylation annotations have changed. Also need to go through specific miRNAs identified in Adrian's paper and in literature.

Anything else?

seb-mueller commented 6 years ago

Nice.

nmatthews323 commented 6 years ago
  1. Link to annotation: https://genome.jgi.doe.gov/portal/pages/dynamicOrganismDownload.jsf?organism=Creinhardtii
  2. Please do! Roughtly my process was: -remove any mRNAs with just one exon (no intron) -use the setdiff function from GenomicRanges to extract differences. -iterate this over each of the mRNAs individually (takes a few minutes...) so we could maintain parent mRNA information. -Some mRNAs had to nested exons relating to it, so no introns as no gap, that's why there's a workaround to return an empty GRanges object if no introns detected, which is then lost when you unlist. I think that should work but please have a look!
  3. I previously used a list of miRNAs from the lab which Bruno gave me, I'll put it in the data folder I created.
  4. I used the repeatmasked assembly (File name: Creinhardtii_281_v5.5.repeatmasked_assembly_v5.0.gff3.gz ) which is accessible also through the phytozome portal linked above. From the metadata provided there I used regular expressions to group into transposon families etc. The mapping I did is outlined in appendix 3 of the report. The "New_Transposon_processing.R" file on here shows the very rough and ready code I used to do this, I'll go through it again and tidy and check it.
  5. Thanks!
nmatthews323 commented 6 years ago

More on (3): We used this rather than mirbase as apparently the lab disagreed with other's definitions of miRNAs...

seb-mueller commented 6 years ago

Nice, as for point 4) I've found this file on the cluster:

/data/public_data/chlamydomonas/20140726_phytozomeV10_Creinhardtii_281_v5.5.annotation/Creinhardtii_281_v5.5.repeatmasked_assembly_v5.0.gff3

The first few lines are

##gff-version 3
##date 2012-01-18
##sequence-region c4058c6ad52899e4141d968721bc69e713269381531m1M33
chromosome_5    RepeatMasker    similarity      998186  998222  16.2    +       .       ID=330541.1;Name=(CCG)n;Target=(CCG)n 3 39
chromosome_5    RepeatMasker    similarity      1000515 1000651 20.4    -       .       ID=330541.2;Name=rnd-1_family-12;Target=rnd-1_family-12 59 207

This seems the Transposons, but the same directory also contains genes:

==> Creinhardtii_281_v5.5.gene.gff <==
##gff-version 3
##annot-version v5.5
chromosome_1    phytozomev10    gene    18766   20237   .       +       .       ID=Cre01.g000017.v5.5;Name=Cre01.g000017
chromosome_1    phytozomev10    mRNA    18766   20237   .       +       .       ID=Cre01.g000017.t1.1.v5.5;Name=Cre01.g000017.t1.1;pacid=30789166;longest=1;Parent=Cre01.g000017.v5.5
chromosome_1    phytozomev10    five_prime_UTR  18766   19162   .       +       .       ID=Cre01.g000017.t1.1.v5.5.five_prime_UTR.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    CDS     19163   19178   .       +       0       ID=Cre01.g000017.t1.1.v5.5.CDS.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    CDS     19329   19948   .       +       2       ID=Cre01.g000017.t1.1.v5.5.CDS.2;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    three_prime_UTR 19949   20237   .       +       .       ID=Cre01.g000017.t1.1.v5.5.three_prime_UTR.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    gene    20356   23957   .       +       .       ID=Cre01.g000033.v5.5;Name=Cre01.g000033

I suppose that's the ones you based your introns on etc. on?

Key lines in the ChlamydomonasTranscriptNameConversionBetweenReleases.Mch12b.txt:

4/10/2014

The next line of the file contains column headings, starting with a comment character
('#'). Columns are space-padded to 25 characters.
These are the column headings, in order, together with an explanation of what version they correspond to
5.5       JGI v5.5 in Phytozome v10
3.1       JGI v3.1 (published in genome paper Merchant et al., 2007)
Genbank   Genbank submission of genome and annotations from Merchant et al. (2007)
4         JGI v4 annotations
4.3       JGI v4.3 (based on Augustus u10.2 annotations)
u5        Augustus u5 annotations
u9        Augustus u9 annotations
5.3.1     JGI v5.3.1 in Phytozome v9.1
...
JGI v5.5 (Phytozome 10)    Augustus update 11.6 (u11.6)-based annotations on v5 assembly, released as JGI v5.5 in Phytozome 10

So I guess this concludes the annotation: JGI v5.5 (Phytozome 10) based annotations on v5 assembly