Open ValWood opened 3 years ago
@snezhkaoliferenko if you know of any missing genes need annotating. I think there will be quite a lot. Add to this ticket.
As per:https://github.com/japonicusdb/japonicus-curation/issues/29
committed in [main 071a1eb]
SJAG_07001 does not come up in JaponicusDB?
I only made it today, it will show up the next time Kim reloads...
I'll do a new load tonight (UK time).
thanks! @kimrutherford btw Canto has been offline for a couple days - you probably know about it?
[main 158a074] fix SJAG_01701 replaced by ~SJAG_07003~ SJAG_07002
btw Canto has been offline for a couple days - you probably know about it?
Sorry about that! Cambridge had a power cut a few days ago and japonicus Canto didn't restart. I hadn't set it up properly. I've fixed that now and Canto is back.
SJAG_07001 does not come up in JaponicusDB?
The site has been updated so it's there now: http://japonicusdb.kmr.nz/gene/SJAG_07001
https://github.com/japonicusdb/japonicus-curation/issues/32
I think the existing SJAG_02438 is really just a translation of a bit to 5'UTR so I will flag this as dubious.
Very pleased with this one - I feel as though I am reviving some ancient craft!
SJAG_06617 looked odd. It turns out it was:
I won't even keep SJAG_06617 as dubious, it would be confusing, largely overlapping in the wrong frame (no shared amino acids) so it will be deleted.
and viola:
[main f43a747] gene stucture updates 3 files changed, 10 insertions(+), 63 deletions(-)
This may not be quite right, but it is the correct length, the final exon s definitely correct, and it now hits qxr10 where the previous structure did not.
[main 9142c84] gene stucture updates and removals
created from N-term of /systematic_id="SJAG_03763"
@snezhkaoliferenko might be of interest to you. This is ER membrane integral protein, implicated in sterol metabolism SPAC56F8.07 gene merge in japonicus.
The trickiest one so far
[main 1f94d6f] gene structure updates committed
Once I figured out where this was, the intron was already sitting there waiting for me:
60S ribosomal protein L41 (diddy)
[X] SJAG_07013 = SPAC3G6.13c
[X] SJAG_07014 = SPAC3F10.18c
[X] SJAG_07016 = SPBC106.07c nat2 N alpha-acetylation related protein Nat2
[X] SJAG_07017 = SPBC26H8.13c Siva family protein
[main ea27e8f] new genes ->7016
Very pleased with this one - I feel as though I am reviving some ancient craft!
SJAG_06617 looked odd. It turns out it was:
* [x] SJAG_07005, new gene in magenta.
I won't even keep SJAG_06617 as dubious, it would be confusing, largely overlapping in the wrong frame (no shared amino acids) so it will be deleted.
and viola:
this is amazeballs also because it gives us insight into japonicus mitochondrial metabolism, ruling out some possibilities. japonicus does not respire and it did lose a number of 'mitochondrial-related' genes
* [x] New gene SJAG_07006 (qcr10) replaces deleted SJAG_03830
This may not be quite right, but it is the correct length, the final exon s definitely correct, and it now hits qxr10 where the previous structure did not.
[main 9142c84] gene stucture updates and removals
ditto this, i thought it was absent
* SJAG_07010
@ValWood thanks! interaction partners in pombe kick ass.
interaction partners in pombe kick ass.
wow, yes they are!
My rule of thumb, if something is present in human, pombe and cerevisiae it will be present in japonicus, almost certainly. There are gene losses in S. c (mainly splicing and heterochromatin related), and gene losses in pombe (mainly metabolic, peroxismal, fatty acid metabolism). But- if genes are present in pombe, cerevisiae and human they will usually present in every other eukaryote.
So if you think you know of any small things that appear to be missing let me know. At present I'm looking at genes usually conserved 1:1 and present in human, pombe, cerevisiae.
Mitochondrial proteins I haven't yet been able to find, but have looked for: cox8, hot13, mrx11, cmc4, img2, mitochondrial ribosomal protein subunit L9, rrg9 I still think they are lurking somewhere. It's really tricky to find small, highly spliced, or disordered proteins. I think some of these are all 3....need more strategies!
I guess we can say that JaponicusDB is already useful! This is one of the points I am trying to make in the manuscript. You can't do an effective comparative analysis of processes and pathways with missing genes! These real world examples will be good for the conclusion ;)
Checking up to date, eventually everything in this ticket is showing up correctly!
@ValWood
OK, these are what I think are present in pombe but missing in japonicus:
SPBPB2B2.02 SPBC1105.04c SPBPB2B2.09c SPAC3H8.03 SPAC22G7.07c SPAC4G8.11c SPAC20G8.04c SPAC105.01c SPAC105.03c SPAC1002.01 SPAC1002.14 SPAC18G6.01c SPAC6C3.02c SPAC6B12.06c SPAC4F8.10c SPAC1805.02c SPAC513.05 SPAC2E1P3.01 SPAC31G5.06 SPAC24C9.16c SPAC6G9.03c SPAC3A11.06 SPAC4H3.08 SPAC25B8.11 SPAC25B8.18 SPAC27D7.04 SPAC27D7.06 SPAC20G4.05c SPAC29B12.12 SPBPB21E7.07 SPBC1198.14c SPBC337.04 SPBC409.16c SPBC691.01 SPBC3H7.06c SPBC29A10.11c SPBC19C7.11 SPBC776.06c SPBC3B8.06 SPBC1347.13c SPBC1652.01 SPCC4B3.06c SPCC132.04c SPCC1281.07c SPCC1223.03c SPAC513.07 SPAC11G7.03 SPBC902.05c
From the point of view of metabolism, Fbp1, Gut2 and the two subunits of isocitrate dehydrogenase (Idh1/2) are fascinating.
Right, I think you are right about Fbp1 and Gut2. These would be difficult to miss, due to the conservation beyond eukaryotes and the large size. And the fact that both subunits are missing. I would not look for these. Etf 1&2 idh1&2 are interesting too, again I would not look for these for the reasons above.
Some of the other differences are unsurprising, because they are multi gene families which have various species specific duplications and losses. Most of these probably have some partially redundant paralog
I will continue to look for cox8, hot13, mix17, mrx11, atp10, saw1 and a few of the other diddy ones. It is possible that some of these are required only for the expression or assembly for the missing metabolic pathway though and so they really are lost....... Cox8 I am sure IS there, but 66 AA and 4 exons it's challenging! I will continue to try to locate.......after lunch...
mzt1 mitotic spindle organizing protein Mzt1 chr2 980849..981070 /systematic_id="SJAG_07018"
I thought I had done this! must have been confusing with something else...
ID range used so far is SJAG_00004 - SJAG_06643 and SJAG_16452-16460 so for new genes I will use
SJAG_07000 - onwards