Open ValWood opened 3 years ago
This looks like a gene fusion. http://japonicusdb.kmr.nz/gene/SJAG_01119
13-08-2021 I'm not so sure. about this, I think the Ctr domain could be a false positive. Leaving for now.
I haven't done this one yet. I'm not keen to start altering the sequence right now if there will be a new assembly. It might be easier to just map everything over from the original sequence. I could model the frameshifts here, but I think there must be additional errors to make this coding because the discrepancies here would not resolve the frameshifts (there would still be stops). Unless the feature start is also incorrect which would affect the counting. @snezhkaoliferenko Can we look at this together sometime next week?
SJAG_05309 and SJAG_05310 is spo15, needs merging
chromosome I
[x] SJAG_06592 mpr1 histidine-containing response regulator phosphotransferase Mpr1 old complement(2800569..>2800973) new complement(2800569..2800949) (trimmed to first met) ASK JUAN TO CHECK
[x] SJAG_05265 jmj1 histone demethylase Jmj1 (predicted) I old complement(748098..749234) new complement(748098..749282) updated, ASK JUAN TO CHECK
[x] SJAG_06586 rpm1 mitochondrial 3'-5' exonuclease for RNA 3' ss-tail I old <2003532..2004896 new join(2002672..2003532,2003532..2004896) FRAMESHIFTED
chromosome II. KE651167.contig
[x] SJAG_03335 rpl18b <883798..884361 ->. join(883723..883725,883801..884361)
[x] SJAG_03345 cdc48 <901614..901681,901739..904112 ->join(901629..901681,901739..904112)
[x] SJAG_04007 the4 acyl-coenzyme A thioesterase The4. <3304702..>3305040 -> 3304435..3305040
[x] SJAG_02947 <101689..103170 -> 101620..103170
[x] SJAG_06608 UNCLEAR, gap upstream, flagged as partial
[x] SJAG_04843 fum1 complement(2362193..>2363659) ->complement(2362193..2363644) COMMITTED [main 0c57601]
[x] SJAG_06603 ppt1 old <1824919..1825884 -> new. 1824811..1825884 COMMITTED [main 9a2d9e1]
[x] SJAG_04339 slx9 ribosome biogenesis protein Slx9 (predicted) II old complement(join(2588209..2588573,2588621..>2588831)) new complement(join(2588209..2588573,2588621..>2588831,2588881..2588883)) (single codon N- tem exon, exact match alignments)
[x] SJAG_04252 ipi1 Rix1 complex Armadillo-type fold Ipi1 II old complement(join(2772002..2772975,2773034..2773156,2773204..>2773276)) new complement(join(2772002..2772975,2773034..2773156,2773204..2773276,2773381..2773425))
beautiful conservation of this new exon:
chromosome III. KE65112.contig
[ ] SJAG_01163 TatD homolog (predicted) III.
old join(<1086388..1086592,1086648..1086849,1086929..1087206,1087299..1087534)
~new join(1086338..1086348,1086393..1086592,1086648..1086849,1086929..1087206,1087299..1087534)~
reverted, possible frameshifted at N term. ASK JUAN TO CHECK
[x] SJAG_01738 mod5 Tea1 anchoring protein Mod5 III old <2257465..2259150 new join(2257376..2257390,<2257465..2259150)
[x] SJAG_01716 clg1 cyclin-like protein involved in autophagy Clg1 (predicted) III old <2211824..2212285 new 2211062..2212285 (twice as long)
[x] SJAG_06636 fma2 methionine aminopeptidase Fma2 (predicted) III old complement(2964342..>2965589) new complement(2964342..2965601)
[x] SJAG_01503 atp12 mitochondrial F1-FO ATP synthase chaperone Atp12 (predicted) III old complement(1769619..1770455) new complement(1769619..1770422)
[x] SJAG_05229 cch1 plasma membrane calcium ion import channel Cch1 III old complement(join(1533562..1535627,1535733..1537070,1537319..1538216)) new complement(join(1533562..1535627,1535733..1537285,1537288..1538999)) FRAMESHIFTED. CHECK START adjacent "SJAG_01395" might be dubious/ second intron might also be a frameshift
[x] SJAG_01764 nan1 U3 snoRNP-associated WD repeat protein Nan1 (predicted) III old complement(join(2314133..2314871,2314960..2315098,2315237..>2316410)) new complement(join(2314133..2314871,2314960..2315098,2315237..2316419)) (not sure this is correct but at least it has a methionine, ASK JUAN TO CHECK)
[x] SJAG_06618 III already removed, see https://github.com/japonicusdb/japonicus-curation/issues/29
today's edits 12-08-2021 [main bfa6108] gene stucture edits 3 files changed, 24 insertions(+), 253 deletions(-)
old join(1823068..1823505,1823646..1823669) new ~join(1823068..1824015,1824018..1824464) much longer, no splice at previous point but later frameshift~ scrub that, its a splice site (conserved in pombe) join(1823068..1824014,1824059..1824464)
old complement(join(1820740..1820814,1820867..1821146,1821210..1821237,1821305..1821506,1821573..1821746,1821924..1822529))
new complement(join(1820740..1820814,1820867..1821146,1821210..1821237,1821305..1821506,1821573..1821782))
[main 1f94d6f] gene structure updates committed
~mpo1 SJAG_00519 old join(3191090..3191107,3191152..3191218,3191275..3191898,3191959..3192104,3192153..3192416) new join(3191090..3191107,3191152..3191218,3191275..3191915) C-term exons repurposed to gna1 glucosamine-phosphate N-acetyltransferase~
SJAG_00519, backtracking. ALso appears to be fused in. Octosporus. So here, I will over-ride the product with
ER membrane fatty acid alpha-oxygenase Mpo1/glucosamine-phosphate N-acetyltransferase Gna1, tandem fusion
trim to met
old complement(3174139..3174510) ~new complement(join(3174139..3174469,3174652..3174683)) updated, also running frameshifted version~ framshifted version eventually lead me to: complement(join(3174139..3174240,3174316..3174439,3174485..3174575,3174668..3174683)) See below for details.
[ ] SJAG_00037 DNA-binding transcription factor, zf-fungal binuclear cluster type (predicted) I old join(<4172798..4172830,4172874..4172956,4173087..4173345,4173707..4175260) new join(4172747..4172830,4172874..4172956,4173087..4173345,4173707..4175260)
[ ] SJAG_03808
old complement(3746361..>3747854)
new complement(join(3746361..3747854,3748068..3748130))
[ ] SJAG_04943 byr4 two-component GAP Byr4 II old <2164224..2165471 new join(2163428..2164182,2164181..2165471) FRAMESHIFTED
[ ] SJAG_03130 iqw1 WD repeat protein, Iqw1 II old complement(join(483592..483682,483725..485199,485250..485916,485965..>486059)) new complement(join(483592..483682,483725..485199,485250..485916,485965..486058,486096..486165))
[ ] SJAG_01212 membrane associated ubiquitin-protein ligase E3, MARCH family (predicted) III old <1180380..1181399 new 1180359..1181399
[ ] SJAG_02459 zinc finger, CCCH-type I old <777366..778520 new 777180..778520 ASK JUAN TO CHECK
[ ] SJAG_02043 byr3 translational activator, zf-CCHC type zinc finger protein (predicted) III old 2888856-2889308 join(2888279..2888353,2888856..2889308)
I almost abandoned this one. Big intron:
==
In smaller contigs. I'm not going to address these as they are often at the end of the contigs. I checked snf5 as it would be useful to get the Met in that but it's still missing.
SJAG_01111 snf5 SWI/SNF complex subunit Snf5 supercont5.31 SJAG_06639 zinc finger, CCHC-type supercont5.5 SJAG_06643 supercont5.26 SJAG_06638 supercont5.5 SJAG_05161 chromo/chromo shadow domain family supercont5.4 SJAG_06641 chromo/chromo shadow domain family supercont5.18
Still to do
There is only one upstream Methionine. I can make an intron but it doesn't look like a good into. It is possible this has some sequence error. I would like to see if the methionine can be confirmed before I add it. ASK JUAN TO CHECK.
Val!! R-e-s-p-e-c-t
From: Val Wood @.> Sent: 12 August 2021 17:57 To: japonicusdb/japonicus-curation @.> Cc: Oliferenko, Snezhana @.>; Comment @.> Subject: Re: [japonicusdb/japonicus-curation] S. japonicus gene coordinate updates (#16)
pfd2. This one was a bastard- took about 1.5 hours!
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjaponicusdb%2Fjaponicus-curation%2Fissues%2F16%23issuecomment-897804916&data=04%7C01%7Csnezhana.oliferenko%40kcl.ac.uk%7Ccefb43809be148c04cec08d95db247c7%7C8370cf1416f34c16b83c724071654356%7C0%7C0%7C637643842567450258%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gUZUlNoNRODOYze1RWe51w%2F80YQwjnUgwECYig9oUmk%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUYNMT3OAYTDSVGZKQEL2HTT4P4P3ANCNFSM4767C5HQ&data=04%7C01%7Csnezhana.oliferenko%40kcl.ac.uk%7Ccefb43809be148c04cec08d95db247c7%7C8370cf1416f34c16b83c724071654356%7C0%7C0%7C637643842567460251%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PXaYP%2FkL8sQud9TndOm3kCPL0uEn2N37thnBlwMrZ5k%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Csnezhana.oliferenko%40kcl.ac.uk%7Ccefb43809be148c04cec08d95db247c7%7C8370cf1416f34c16b83c724071654356%7C0%7C0%7C637643842567460251%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aqZWHnYxEimt8%2BYgyq6oOHR0hHwtXScfq4EtrdfFeEg%3D&reserved=0 or Androidhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26utm_campaign%3Dnotification-email&data=04%7C01%7Csnezhana.oliferenko%40kcl.ac.uk%7Ccefb43809be148c04cec08d95db247c7%7C8370cf1416f34c16b83c724071654356%7C0%7C0%7C637643842567460251%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=z%2BeWbWtqKcFDWA8%2FbPS0GtHoUYNevLn1J1DbX64HFN0%3D&reserved=0.
[main d977fb1] final structure updates round1 3 files changed, 14 insertions(+), 186 deletions(-)
The ones I can do are now all committed. @juanmatacambridge I have flagged ~6~ 7 in this ticket where it would be useful to see if you can see riboseq signal @snezhkaoliferenko I am struggling with FAS beta. Can we take a look at that next week?
@ValWood sure thank you
Hi, I’m going away to Spain and won’t have time to do them all before going. Just one… 03764 is interesting: single translated ORF from 1,823,068 to 1,824,156. I can see the annotated intron spliced out in only two reads, but dozens of reads map to the intron – so it should be annotated as single CDS without the intron. Then translation between 1,824,156 until 1,824,464. The two genes just overlap – there must be a mutation that changes the frame between the two ORFs Quite weird structure – how does it look in other Schizos? Best Juan
because sometimes the minor isoform is the functional isoform (apn1 in pombe).... or it could be a retained intron, anyway the product makes it clear that this needs investigating further, and I will add "warnings" once we have these available.
[main 9ed759c] alg10 1 file changed, 1 insertion(+)
tam13 is strange, see https://github.com/pombase/curation/issues/3069
[1] Mug147 and tam13 are clearly distinct in S. pombe.
[2] SJAG_00388 is very weird. There is no evidence of translation on the annotated strand, but there is very strong translation in the complement.
The coordinates of the translated gene are complement 3,445,140 to 3,445,242. There must be a sequencing errors or an unannotated intron, as the frame changes half way through the protein.
It’s a microprotein, but the signal is very strong and clear. I can’t find it in pombe by blast!
Juan
PS. If you want to have the ribosome profiling of japonicus let me know, it can be very useful (or puzzling)…
[2] above OK thanks, I have annotated SJAG_00388 as "dubious, no evidence for translation on annotated strand" I won't annotate the frameshifted microprotein as it is only 33 AA and likely a uORF for SJAG_00389 I left this as a misc_feature for future reference
@juanmatacambridge don't worry about the others. We have all that we are doing for the paper, and have made good improvements. The rest is a 'work in progress' and can be handled by future work from the japonicus community.
old complement(join(1427768..1429375,1429526..1429801,1429905..1429964)) new