Closed seb-mueller closed 6 years ago
Good point. I'll take another look and ask about this Alisons Smith lab as well (they work on Chlamy).
I've just had a chat with Andre. Looks like we are indeed on the newest version! No remapping, yay! The annotation might have changed though, the newest seems to be in here:
/data/public_data/chlamydomonas/20140726_phytozomeV10_Creinhardtii_281_v5.5.annotation
OK that's great, saves us quite a bit of time. It might still be worth re-running the segmentation if we have new libraries (you mentioned AGO mutant libraries?), won't take much effort to re-run.
Not much new has really been published: https://www.ebi.ac.uk/arrayexpress/search.html?query=%22Chlamydomonas+reinhardtii%22+ Same for GEO, so we should be good as for the public ones.
I'll ask David/Tom about it some non-published.
Ok, the current external ones might need checking for validity in terms of protocols, I think one of them used 454 sequencing, I'll have a look
I've uploaded a quick evaluation of the external datasets - see commit notes.
I've just checked, there are three AGO mutants from Betty in my list (right at bottom) which went through the segmentation, must have just been done in time, passed me by that this had happened in time, but there may be more recent ones..?
Ok, to conclude I've put the genome and annotation into this folder:
/projects/nick_matthews/resources $ head *
==> Creinhardtii_236.fa <==
>chromosome_1
GGGAACCAGCTACTAGATGGTTCGATTAGTCTTTCGCCCCTATACCCAAGTCTGAAAAGCGATTTGCACGTCAGCACATC
TACGAGCCTCCACCAGAGTTTCCTCTGGCTTCACCCTGCTCAGGCATAGTTCACCATCTTTCGGGTCCCAACAGGTATGC
TCGCACTCAAACCTTTCGTAGAAACAACATGGTCGGTCGATGGTGCAGGGTTTTACCCCATCCCACCAGTCAGGTTACTT
GCGCTTACGGGTTTTCCACCCGCCAACTCGCATACATGTTAGACTCCTTGGTCCGTGTTTCAAGACGGGTCGATTGACGC
TCTTCTGCCAGAATCTTTAGAGCACAGATCCCGAAGGACAAGGTACTCTTTACGCCTTGGTCGAGTCGGCGGCATCGGCC
GGGTTACCTGGGTGGACCCAGCTTTTGTCCCGCCAACTCAACCCATTCTGACCAGCACCCAGCACATTCAACGGGCCGTT
AGGACCGCTTAAGCCTGGGCGCACCTACGAGCGCCAATCGCTTCCCTCTCAACAATTTCAAGCACTTTTAACTCTCTTTT
CAAAGTTCTTTTCATCTTTCCCTCACGGTACTTGTTCGCAGAAGGGATTTACCTCCAAATTAGGGCTGCATTCCCAAACA
ACCCGACTCGTGGAAAGCACTTCGTGGAAGGACTAAGCAGGAACCGACGGGGTTATCACCCTCTCTGACGCGGCATTCGA
==> Creinhardtii_281_v5.5.gene_exons.gff3 <==
##gff-version 3
##annot-version v5.5
chromosome_1 phytozomev10 gene 18766 20237 . + . ID=Cre01.g000017.v5.5;Name=Cre01.g000017
chromosome_1 phytozomev10 mRNA 18766 20237 . + . ID=Cre01.g000017.t1.1.v5.5;Name=Cre01.g000017.t1.1;pacid=30789166;longest=1;Parent=Cre01.g000017.v5.5
chromosome_1 phytozomev10 exon 18766 19178 . + . ID=Cre01.g000017.t1.1.v5.5.exon.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1 phytozomev10 five_prime_UTR 18766 19162 . + . ID=Cre01.g000017.t1.1.v5.5.five_prime_UTR.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1 phytozomev10 CDS 19163 19178 . + 0 ID=Cre01.g000017.t1.1.v5.5.CDS.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1 phytozomev10 exon 19329 20237 . + . ID=Cre01.g000017.t1.1.v5.5.exon.2;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1 phytozomev10 CDS 19329 19948 . + 2 ID=Cre01.g000017.t1.1.v5.5.CDS.2;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1 phytozomev10 three_prime_UTR 19949 20237 . + . ID=Cre01.g000017.t1.1.v5.5.three_prime_UTR.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
==> Creinhardtii_281_v5.5.gene.gff3 <==
##gff-version 3
##annot-version v5.5
chromosome_1 phytozomev10 gene 18766 20237 . + . ID=Cre01.g000017.v5.5;Name=Cre01.g000017
chromosome_1 phytozomev10 mRNA 18766 20237 . + . ID=Cre01.g000017.t1.1.v5.5;Name=Cre01.g000017.t1.1;pacid=30789166;longest=1;Parent=Cre01.g000017.v5.5
chromosome_1 phytozomev10 five_prime_UTR 18766 19162 . + . ID=Cre01.g000017.t1.1.v5.5.five_prime_UTR.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1 phytozomev10 CDS 19163 19178 . + 0 ID=Cre01.g000017.t1.1.v5.5.CDS.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1 phytozomev10 CDS 19329 19948 . + 2 ID=Cre01.g000017.t1.1.v5.5.CDS.2;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1 phytozomev10 three_prime_UTR 19949 20237 . + . ID=Cre01.g000017.t1.1.v5.5.three_prime_UTR.1;Parent=Cre01.g000017.t1.1.v5.5;pacid=30789166
chromosome_1 phytozomev10 gene 20356 23957 . + . ID=Cre01.g000033.v5.5;Name=Cre01.g000033
chromosome_1 phytozomev10 mRNA 20356 23957 . + . ID=Cre01.g000033.t1.1.v5.5;Name=Cre01.g000033.t1.1;pacid=30788883;longest=1;Parent=Cre01.g000033.v5.5
==> Creinhardtii_281_v5.5.repeatmasked_assembly_v5.0.gff3 <==
##gff-version 3
##date 2012-01-18
##sequence-region c4058c6ad52899e4141d968721bc69e713269381531m1M33
chromosome_5 RepeatMasker similarity 998186 998222 16.2 + . ID=330541.1;Name=(CCG)n;Target=(CCG)n 3 39
chromosome_5 RepeatMasker similarity 1000515 1000651 20.4 - . ID=330541.2;Name=rnd-1_family-12;Target=rnd-1_family-12 59 207
chromosome_5 RepeatMasker similarity 1000571 1000681 23.6 + . ID=330541.3;Name=rnd-4_family-476;Target=rnd-4_family-476 3783 3890
chromosome_5 RepeatMasker similarity 1001246 1001364 8.4 + . ID=330541.4;Name=(CA)n;Target=(CA)n 2 126
chromosome_5 RepeatMasker similarity 1001521 1001651 13.1 - . ID=330541.5;Name=rnd-1_family-3;Target=rnd-1_family-3 1 131
chromosome_5 RepeatMasker similarity 1001673 1001799 5.6 - . ID=330541.6;Name=rnd-1_family-3;Target=rnd-1_family-3 1 125
chromosome_5 RepeatMasker similarity 1002116 1002297 11.8 + . ID=330541.7;Name=(CA)n;Target=(CA)n 2 179
Also, which Chlamy assembly did you use? I think the current one is 5.5 if I'm not mistaken (did you use phytozome?): https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Creinhardtii
Nicks first answer:
v5.5 does look like the current assembly on Phytozome. In my report I say V5 assembly, and I used the assembly stored on the cluster with this path when I needed it: /data/public_data/chlamydomonas/assembly5/Creinhardtii_236.fa
However, internal libraries were already aligned.
Are you sure it's changed? It looks like v5 came out in 2012 (https://www.ncbi.nlm.nih.gov/pubmed/24950814), and acording to the paper I link, v5.5 of the gene model/annotations was released in 2014. However, Phytozome, which I did use to get annotations from, states its release date as 6/12/2017. I think maybe the number before the '.' refers to the assembly and the number after to the annotation. Saying that, I really don't know...