seb-mueller / chlamy_locus_map

Small RNA Locus Map for Chlamydomonas reinhardtii
GNU General Public License v3.0
1 stars 0 forks source link

Working out which assembly to use #2

Closed seb-mueller closed 6 years ago

seb-mueller commented 6 years ago

Also, which Chlamy assembly did you use? I think the current one is 5.5 if I'm not mistaken (did you use phytozome?): https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Creinhardtii

Nicks first answer:

v5.5 does look like the current assembly on Phytozome. In my report I say V5 assembly, and I used the assembly stored on the cluster with this path when I needed it: /data/public_data/chlamydomonas/assembly5/Creinhardtii_236.fa

However, internal libraries were already aligned.

Are you sure it's changed? It looks like v5 came out in 2012 (https://www.ncbi.nlm.nih.gov/pubmed/24950814), and acording to the paper I link, v5.5 of the gene model/annotations was released in 2014. However, Phytozome, which I did use to get annotations from, states its release date as 6/12/2017. I think maybe the number before the '.' refers to the assembly and the number after to the annotation. Saying that, I really don't know...

seb-mueller commented 6 years ago

Good point. I'll take another look and ask about this Alisons Smith lab as well (they work on Chlamy).

seb-mueller commented 6 years ago

I've just had a chat with Andre. Looks like we are indeed on the newest version! No remapping, yay! The annotation might have changed though, the newest seems to be in here:

/data/public_data/chlamydomonas/20140726_phytozomeV10_Creinhardtii_281_v5.5.annotation

nmatthews323 commented 6 years ago

OK that's great, saves us quite a bit of time. It might still be worth re-running the segmentation if we have new libraries (you mentioned AGO mutant libraries?), won't take much effort to re-run.

seb-mueller commented 6 years ago

Not much new has really been published: https://www.ebi.ac.uk/arrayexpress/search.html?query=%22Chlamydomonas+reinhardtii%22+ Same for GEO, so we should be good as for the public ones.

I'll ask David/Tom about it some non-published.

nmatthews323 commented 6 years ago

Ok, the current external ones might need checking for validity in terms of protocols, I think one of them used 454 sequencing, I'll have a look

nmatthews323 commented 6 years ago

I've uploaded a quick evaluation of the external datasets - see commit notes.

I've just checked, there are three AGO mutants from Betty in my list (right at bottom) which went through the segmentation, must have just been done in time, passed me by that this had happened in time, but there may be more recent ones..?

seb-mueller commented 6 years ago

Ok, to conclude I've put the genome and annotation into this folder:

/projects/nick_matthews/resources $ head *
==> Creinhardtii_236.fa <==
>chromosome_1
GGGAACCAGCTACTAGATGGTTCGATTAGTCTTTCGCCCCTATACCCAAGTCTGAAAAGCGATTTGCACGTCAGCACATC
TACGAGCCTCCACCAGAGTTTCCTCTGGCTTCACCCTGCTCAGGCATAGTTCACCATCTTTCGGGTCCCAACAGGTATGC
TCGCACTCAAACCTTTCGTAGAAACAACATGGTCGGTCGATGGTGCAGGGTTTTACCCCATCCCACCAGTCAGGTTACTT
GCGCTTACGGGTTTTCCACCCGCCAACTCGCATACATGTTAGACTCCTTGGTCCGTGTTTCAAGACGGGTCGATTGACGC
TCTTCTGCCAGAATCTTTAGAGCACAGATCCCGAAGGACAAGGTACTCTTTACGCCTTGGTCGAGTCGGCGGCATCGGCC
GGGTTACCTGGGTGGACCCAGCTTTTGTCCCGCCAACTCAACCCATTCTGACCAGCACCCAGCACATTCAACGGGCCGTT
AGGACCGCTTAAGCCTGGGCGCACCTACGAGCGCCAATCGCTTCCCTCTCAACAATTTCAAGCACTTTTAACTCTCTTTT
CAAAGTTCTTTTCATCTTTCCCTCACGGTACTTGTTCGCAGAAGGGATTTACCTCCAAATTAGGGCTGCATTCCCAAACA
ACCCGACTCGTGGAAAGCACTTCGTGGAAGGACTAAGCAGGAACCGACGGGGTTATCACCCTCTCTGACGCGGCATTCGA

==> Creinhardtii_281_v5.5.gene_exons.gff3 <==
##gff-version 3
##annot-version v5.5
chromosome_1    phytozomev10    gene    18766   20237   .       +       .       ID=Cre01.g000017.v5.5;Name=Cre01.g000017
chromosome_1    phytozomev10    mRNA    18766   20237   .       +       .       ID=Cre01.g000017.t1.1.v5.5;Name=Cre01.g000017.t1.1;pacid=30789166;longest=1;Parent=Cre01.g000017.v5.5
chromosome_1    phytozomev10    exon    18766   19178   .       +       .       ID=Cre01.g000017.t1.1.v5.5.exon.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    five_prime_UTR  18766   19162   .       +       .       ID=Cre01.g000017.t1.1.v5.5.five_prime_UTR.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    CDS     19163   19178   .       +       0       ID=Cre01.g000017.t1.1.v5.5.CDS.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    exon    19329   20237   .       +       .       ID=Cre01.g000017.t1.1.v5.5.exon.2;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    CDS     19329   19948   .       +       2       ID=Cre01.g000017.t1.1.v5.5.CDS.2;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    three_prime_UTR 19949   20237   .       +       .       ID=Cre01.g000017.t1.1.v5.5.three_prime_UTR.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166

==> Creinhardtii_281_v5.5.gene.gff3 <==
##gff-version 3
##annot-version v5.5
chromosome_1    phytozomev10    gene    18766   20237   .       +       .       ID=Cre01.g000017.v5.5;Name=Cre01.g000017
chromosome_1    phytozomev10    mRNA    18766   20237   .       +       .       ID=Cre01.g000017.t1.1.v5.5;Name=Cre01.g000017.t1.1;pacid=30789166;longest=1;Parent=Cre01.g000017.v5.5
chromosome_1    phytozomev10    five_prime_UTR  18766   19162   .       +       .       ID=Cre01.g000017.t1.1.v5.5.five_prime_UTR.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    CDS     19163   19178   .       +       0       ID=Cre01.g000017.t1.1.v5.5.CDS.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    CDS     19329   19948   .       +       2       ID=Cre01.g000017.t1.1.v5.5.CDS.2;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    three_prime_UTR 19949   20237   .       +       .       ID=Cre01.g000017.t1.1.v5.5.three_prime_UTR.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1    phytozomev10    gene    20356   23957   .       +       .       ID=Cre01.g000033.v5.5;Name=Cre01.g000033
chromosome_1    phytozomev10    mRNA    20356   23957   .       +       .       ID=Cre01.g000033.t1.1.v5.5;Name=Cre01.g000033.t1.1;pacid=30788883;longest=1;Parent=Cre01.g000033.v5.5

==> Creinhardtii_281_v5.5.repeatmasked_assembly_v5.0.gff3 <==
##gff-version 3
##date 2012-01-18
##sequence-region c4058c6ad52899e4141d968721bc69e713269381531m1M33
chromosome_5    RepeatMasker    similarity      998186  998222  16.2    +       .       ID=330541.1;Name=(CCG)n;Target=(CCG)n 3 39
chromosome_5    RepeatMasker    similarity      1000515 1000651 20.4    -       .       ID=330541.2;Name=rnd-1_family-12;Target=rnd-1_family-12 59 207
chromosome_5    RepeatMasker    similarity      1000571 1000681 23.6    +       .       ID=330541.3;Name=rnd-4_family-476;Target=rnd-4_family-476 3783 3890
chromosome_5    RepeatMasker    similarity      1001246 1001364  8.4    +       .       ID=330541.4;Name=(CA)n;Target=(CA)n 2 126
chromosome_5    RepeatMasker    similarity      1001521 1001651 13.1    -       .       ID=330541.5;Name=rnd-1_family-3;Target=rnd-1_family-3 1 131
chromosome_5    RepeatMasker    similarity      1001673 1001799  5.6    -       .       ID=330541.6;Name=rnd-1_family-3;Target=rnd-1_family-3 1 125
chromosome_5    RepeatMasker    similarity      1002116 1002297 11.8    +       .       ID=330541.7;Name=(CA)n;Target=(CA)n 2 179