pombase / curation

PomBase curation
7 stars 0 forks source link

evaluation of human NDs #2192

Closed ValWood closed 5 years ago

ValWood commented 5 years ago

All of these are ND for process, we could probably squeeze a couple out, but for this purpose these can be ND. Nothing is screaming out. I checked UniPROt, close orthologs, PubMed, protein families and domains. There is nothing "well characterised" in here.

ND OK Q8WXG8 S100Z PMID:28074300 most recent all we know is that it binds calcium Q5VU92 DCAF12L1 DDB1- and CUL4-associated factor 12-like protein 1 Q9BZE7 C22orf23 UPF0193 protein EVG1 P56378 MP68 6.8 kDa mitochondrial proteolipid ATP synthase membrane subunit 6.8PL PMID:17570365 Therefore, their roles are likely to be peripheral to the synthesis of ATP. NOT ENOUGH INFO Q9BUK0 CHCHD7 Coiled-coil-helix-coiled-coil-helix domain-containing protein 7 PMID: 22842048 structue but no function/process Q96AQ2 TMEM125 NO DATA O75264 SMIM24 NO DATA Q8WXE0 CASKIN2 UNCERTAIN CASKIN2 is a homolog of CASKIN1, a scaffolding protein that participates in a signaling network with CASK (calcium/calmodulin-dependent serine kinase). Despite a high level of homology between CASKIN2 and CASKIN1, CASKIN2 cannot bind CASK due to the absence of a CASK Interaction Domain and consequently, may have evolved undiscovered structural and functional distinctions. Q6P3X3 TTC27 C. elegant ortholog TRD-1, which is essential for cell fate determination in both the germline and the developing epidermis POMBE ORTHOLOG< CONSERVED UNKNOWN CLEARLY A MORE GENERAL ROLE https://www.pombase.org/gene/SPAC19B12.01 ONE OF THE MOST INTERESTiNG UNKNOWNS STRONG ASSOCIATION WITH SIP/FAR COMPLEX http://string-db.org/cgi/network.pl?taskId=f2cWZqcvV0MQ Q9BVX2 TMEM106C /EMOC TMEM106 has 86 papers but no good handle on function/process hypomyelinating leukodystrophy. is a lung cancer driver, but all descriptions are pathological 10 Q96A25 TMEM106A see above P58658 EVA1C Protein eva-1 homolog C mouse PMID: 24040182 expressed in neurones consistent with an axon guidance role not enough info (2013) Q9BZ81 MAGEB5 Melanoma-associated antigen B5 Q9HCI5 MAGEE1 Melanoma-associated antigen E1 P43365 MAGEA12 Melanoma-associated antigen 12 P43358 MAGEA4 Melanoma-associated antigen 4 Q8N7X4 MAGEB6 Melanoma-associated antigen B6 Q9Y6I8 MAGEA9 Melanoma-associated antigen 9 P43361 MAGEA8 Melanoma-associated antigen 8 P43360 MAGEA6 Melanoma-associated antigen 6 20 Q9BZD7 PRRG3 Transmembrane gamma-carboxyglutamic acid protein 3 Q9BZD6 PRRG4 Transmembrane gamma-carboxyglutamic acid protein 4 Q5JX71 FAM209A Protein FAM209A Q9H0X4 FAM234A Protein FAM234A Q9NRY5 FAM114A2 Protein FAM114A2 Q8NB25 FAM184A Protein FAM184A Q15884 FAM189A2 Protein FAM189A2 (plasma membrane family) Q92545 TMEM131 no info human or mouse Q9BTD3 TMEM121 Transmembrane protein 121 (expression connects to angiogenesis) Q8N2U0 Q9BTD3 TMEM256 Transmembrane protein 256 https://www.pombase.org/spombe/result/SPAC1782.12c 30 Q8WW59 SPRYD4 SPRY domain-containing protein 4 Q9H478 KCNQ1DN KCNQ1 downstream neighbor protein (68 AA) no orthologs O95170 CDRT1 CMT1A duplicated region transcript 1 protein Q9Y3S2 ZNF330 Zinc finger protein 330 Q86UN6 AKAP14 A-kinase anchor protein 14 PMID:1247594 expression only Q8IXM6 NRM Nurim PMID:23092226 resulted in an abnormal shape change of the nuclear envelope Q6UWP8 SBSN Suprabasin tumour cell marker, unknown function P59020 DSCR9 Down syndrome critical region protein 9 PMID: 12168953 primate-specific genes in DSCR P56555 DSCR4 Down syndrome critical region protein 4 primate-specific genes in DSCR (118AA) Q96T75 DSCR8 Down syndrome critical region protein 8 (97 AA) (Absent from mouse) 40 Q9BW04 SARG Specifically androgen-regulated gene protein PMID: 15525603 Q6UWF7 NXPE4 NXPE family member 4 Q13066 GAGE2B G antigen 2B/2C Antigen, recognized on melanoma by autologous cytolytic T-lymphocytes O14524 NEMP1 Nuclear envelope integral membrane protein 1 O15037 KHNYN Protein KHNYN Q8NB37 GATD1 Glutamine amidotransferase-like class 1 domain-containing protein 1 Q96J86 CYYR1 Cysteine and tyrosine-rich protein 1 only paper, ND PMID: 24981926 A5PL33 KRBA1 Protein KRBA1 Q9NWV4 C1orf123 UPF0587 protein C1orf123 https://www.pombase.org/gene/SPBC2D10.03c Q96CP2 FLYWCH2 FLYWCH family member 2 50 O15482 TEX28 Testis-specific protein TEX28 Q53EV4 LRRC23 Leucine-rich repeat-containing protein 2 Q05BV3 EML5 Echinoderm microtubule-associated protein-like 5 Q9H5F2 C11orf1 UPF0686 protein C11orf1 P58511 SMIM11A Small integral membrane protein 11A Q9BSJ5 C17orf80 Uncharacterized protein C17orf80 P23610 F8A1 Factor VIII intron 22 protein Q92617 NPIPB3 Nuclear pore complex-interacting protein family member B3 Q9Y334 VWA7 von Willebrand factor A domain-containing protein 7 Q969K7 TMEM54 Transmembrane protein 54 60 Q14DG7 TMEM132B Transmembrane protein 132B (implicated in intracranial aneurysm) Q5VZI3 TMEM268 Transmembrane protein 268 http://www.ebi.ac.uk/interpro/entry/IPR028054/ Q5TEA3 C20orf194 Uncharacterized protein C20orf194 Q5TZF3 ANKRD45 Ankyrin repeat domain-containing protein 45 Q53RE8 ANKRD39 Ankyrin repeat domain-containing protein 39 Q6AI12 ANKRD40 Ankyrin repeat domain-containing protein 40 Q9UHP6 RSPH14 Radial spoke head 14 homolog calcium channel, voltage-dependent, R type, alpha 1E subunit is paralog? Q5TEZ5 C6orf163 Uncharacterized protein C6orf163 Q8TE82 SH3TC1 SH3 domain and tetratricopeptide repeat-containing protein 1 Q8NEC7 GSTCD Glutathione S-transferase C-terminal domain-containing protein 70 Q9Y4K1 CRYBG1 Beta/gamma crystallin domain-containing protein 1 Q5T7W7 TSTD2 Thiosulfate sulfurtransferase/rhodanese-like domain-containing protein 2 Q9BVG4 PBDC1 Protein PBDC1 https://www.pombase.org/spombe/result/SPBC3E7.07c Q9UJJ7 RPUSD1 RNA pseudouridylate synthase domain-containing protein 1 P81408 FAM189B Protein FAM189B Q9BUW7 UPF0184 protein C9orf16 Q86WB7 UNC93A Protein unc-93 homolog A Q6ZV65 FAM47E Protein FAM47E P57060 RWDD2B RWD domain-containing protein 2B Q8NEA5 C19orf18 Uncharacterized protein C19orf18 (mammalian only) Q9NYP8 C21orf62 Uncharacterized protein C21orf62 80 O60829 PAGE4 P antigen family member 4 "G antigen, family C, 1", GAGEC1 Q9UKJ3 GPATCH8 G patch domain-containing protein 8 P69849 NOMO3 Nodal modulator 3 Q15155 NOMO1 Nodal modulator 1 precursor Q96BZ8 LENG1 Leukocyte receptor cluster member 1 https://www.pombase.org/spombe/result/SPCC5E4.10c Q9NY87 SPANXC Sperm protein associated with the nucleus on the X chromosome C (human mouse, 97AA) Q9Y6Z2 Uncharacterized protein encoded by LINC01558 (57 AA) C9JUS6 Putative adrenomedullin-5-like protein ADM5 (153AA, no protein family, no mouse) P08118 MSMB Beta-microseminoprotein PMID: 29250809 protrate cancer marker, secreted protein, unknown Q9BTY7 HGH1 Protein HGH1 homolog https://www.pombase.org/gene/SPAC26F1.12c 90 O43301 HSPA12A Heat shock 70 kDa protein 12A Q96ND0 FAM210A Protein FAM210A (this could be N-terminal peptidyl-methionine acetylation which is ‘protein maturation’ but I’m leaving it out since its a modification Q13296 SCGB2A2 Mammaglobin-A Q15527 SURF2 Surfeit locus protein 2 Q96MV1 TMEM56 Transmembrane protein 56 https://www.pombase.org/gene/SPAC17A2.02c Q9BXX2 ANKRD30B Ankyrin repeat domain-containing protein 30B Q9NW97 TMEM51 Transmembrane protein 51 Q8NE00 TMEM104 Transmembrane protein 104 Q9NQF3 SERHL Serine hydrolase-like protein Q6UWT4 C5orf46 Uncharacterized protein C5orf46 100 Q9GZL8 BPESC1 Putative BPES syndrome breakpoint region protein (116AA) no mouse Q13166 CATR1 CATR tumorigenic conversion 1 protein (79aa no mouse) Q9H1U4 MEGF9 Multiple epidermal growth factor-like domains protein 9 Q14656 TMEM187 Transmembrane protein 187 (261 AA no mouse) Q9Y4X0 AMMECR1 AMME syndrome candidate gene 1 protein https://www.pombase.org/gene/SPAC688.03c Q96AN5 TMEM143 Transmembrane protein 143 Q5BJH2 TMEM128 Transmembrane protein 128
Q9BZW5 TM6SF1 Transmembrane 6 superfamily member 1 https://www.pombase.org/spombe/result/SPAC56F8.07 Q9BUV8 RAB5IF Uncharacterized protein RAB5IF Q9H330 TMEM245 Transmembrane protein 245 110 Q8NCL8 TMEM116 Transmembrane protein 116 (human only, 245AA) Q86VS3 IQCH IQ domain-containing protein H (Cytoskeleton associated protein but will leave this one) Q2TBC4 PRICKLE4 Prickle-like protein 4 Q9NRQ5 SMCO4 Single-pass membrane and coiled-coil domain-containing protein 4 (59 AA) DUF4519 Q1L6U9 MSMP Prostate-associated microseminoprotein (139a a) O15442 MPPED1 Metallophosphoesterase domain-containing protein 1 Q96JP2 MYO15B Unconventional myosin-XVB Q96G28 CFAP36 Cilia- and flagella-associated protein 36 Q68D91 MBLAC2 Metallo-beta-lactamase domain-containing protein 2 Q9GZP8 IMUP Immortalization up-regulated protein 120 Q9Y675 SNURF SNRPN upstream reading frame protein (71 uORF?) P10163 PRB4 Basic salivary proline-rich protein 4 Q9BXM9 FSD1L FSD1-like protein
Q9GZU0 C6orf62 Uncharacterized protein C6orf62 Q9H4I8 SERHL2 Serine hydrolase-like protein 2 Q8TBR7 FAM57A Protein FAM57A Q8WW52 FAM151A Protein FAM151A Q68DQ2 CRYBG3 Very large A-kinase anchor protein A1L0T0 ILVBL Acetolactate synthase-like protein Q8N1D0 SLC22A18AS Beckwith-Wiedemann syndrome chromosomal region 1 candidate gene B protein 130 Q9UIG4 PSORS1C2 Psoriasis susceptibility 1 candidate gene 2 protein Q8NEP7 KLHDC9 Kelch domain-containing protein 9 O75095 MEGF6 Multiple epidermal growth factor-like domains protein 6 Q03252 LMNB2 Lamin-B2 Q8N6N2 SERHL2 Serine hydrolase-like protein 2 Q9H2S5 RNF39 RING finger protein 39 Q6UXA7 C6orf15 Uncharacterized protein C6orf15 Q4KMG9 TMEM52B Transmembrane protein 52B Q9GZN8 C20orf27 UPF0687 protein C20orf27 Q8N3T6 TMEM132C Transmembrane protein 132C 140 O76087 GAGE7 G antigen 7 P30408 TM4SF1 Transmembrane 4 L6 family member 1 (a tetraspanin family. I think there are often membrane-membrane connection molecules, like claudin in pombe but not enough info) O00193 SMAP Small acidic protein https://www.pombase.org/spombe/result/SPBC14C8.19 O15255 RTL8C CAAX box protein 1 retrotransposon Gag like 8C? Q8TF65 GIPC2 PDZ domain-containing protein GIPC2 (PMID:28472630 eye development in Xenopus, pDZ are usually signalling but not clear) Q562R1 ACTBL2 Beta-actin-like protein 2 (B actins reported in this entry to be mediators of internal cell motility, but I can’t confirmation find a way to annotate this Q9NXU5 ARL15 ADP-ribosylation factor-like protein 15 Q09MP3 RAD51AP2 RAD51-associated protein 2 might be repair but the conserved domain is weak, http://www.ebi.ac.uk/interpro/entry/IPR031419 P09565 GIG44 Putative insulin-like growth factor 2-associated protein (queried insulin association with UniPRot)

P02812 PRB2 Basic salivary proline-rich protein 2 PMID: 26375204 The PRP functional role is still poorly understood. Digestion?

P04280 PRB1 Basic salivary proline-rich protein 1

Q9HCN3 TMEM8A Post-GPI attachment to proteins factor 6 https://en.wikipedia.org/wiki/TMEM8A

P55808 NG (Glycoprotein NG), blood group antigen (NEEDS MORE DIGGING)

Q8NFV4 ABHD11 Protein ABHD11 This is a predominantly single copy hydrolase conserved to bacteria. it is mitochondrial in eukaryotes, and in bacteria appears to have a role in https://www.uniprot.org/uniprot/P75736 fatty acid metabolism

PCNX2 A6NKB5 Pecanex-like protein 2 Drosophila, the pecanex (pcx) gene, which encodes an evolutionarily conserved multi-pass transmembrane protein, appears to be required to activate Notch signaling in some contexts, especially during neuroblast segregation in the neuroectoderm. Although Pcx has been suggested to contribute to endoplasmic reticulum homeostasis, its functions remain unknown. classing as unknown

Q8IYX8 CEP57L1 Centrosomal protein CEP57L1 (CEP57 has chromosome segregation from PAINT. It looks as though this will also be centrosome associated) because of uncertainty about the role of the paralog: https://github.com/geneontology/go-annotation/issues/2086

Q5T013 HYI Putative hydroxypyruvate isomerase (I can’t find the evidence for this)

Q9NR77 PXMP2 Peroxisomal membrane protein 2 ISO-mouse:PMID:19352492 still not clear so leaving this http://pfam.xfam.org/family/PF04117 Q53FP2 TMEM35A peroxisome PMID:27170659 memory in mouse? (TMEM35 does not mention A or B) not clear

ValWood commented 5 years ago

These are none coding RNAs according to publications recent publications

Q9Y5M1 FAM215A Uncharacterized protein FAM215A (114AA) no conservation APPEARS TO BE A LINC PMID: 27667152

O15453 NBR2 Next to BRCA1 gene 2 protein NO CONSERVATION, DOES NOT LOOK REAL, ENSEMBL DESCRIBES AS LINC

P59022 DSCR10 Down syndrome critical region protein 10 (87 AA) (HGNC describes as non-protein coding)

O95177 GAS8-AS1 Uncharacterized protein GAS8-AS1 PMID: 30228180 The long noncoding RNA GAS8-AS1 suppresses hepatocarcinogenesis by epigenetically activating the tumor suppressor GAS8.

not sure what to do with these for this purpose....

ValWood commented 5 years ago

These 42 are "annotatable" @Antonialock need putting in the final GAF

ISO-mouse

PAINT

IC

@ These need adding to the master spreadsheet:

The appropriate term for these 2 might be ’sperm motility’?

If not enough for GO move up to unknown

Antonialock commented 5 years ago

could you also give the evidence codes?

On Tue, Sep 25, 2018 at 3:08 PM, Val Wood notifications@github.com wrote:

These 42 are "annotatable" @Antonialock https://github.com/Antonialock need putting in the final GAF

ISO-mouse

  • Q8NEB7 ACRBP Acrosin-binding protein ISO-MGI:1859515 spermatid development GO:0007286
  • A4D263 SPATA48 Spermatogenesis-associated protein 48 ISO-MGI:1921112 PMID:29700843(2018) spermatogenesis GO:0007283 MOUSE DOES NOT YET HAVE THIS ANNOTATION)
  • Q8IYQ7 THNSL1 serine family amino acid catabolic process ISO-mouse:PMID:17034760 MOUSE DOES NOT YET HAVE THIS ANNOTATION)
  • Q8N3D4 EHBP1L1 EH domain-binding protein 1-like protein 1 mouse:PMID:26833786 (2016) maintenance of cell polarity (epithelial) MOUSE DOES NOT YET HAVE THIS ANNOTATION)

PAINT

-

P50851 LRBA Lipopolysaccharide-responsive and beige-like anchor protein NEEDS work, beach domain have a general role in membrane fusion https://www.ncbi.nlm.nih.gov/pubmed/23521701 https://www.ncbi.nlm.nih.gov/pubmed/23521701 maybe a mapping or PAINT? muse ortholog gets protein localization ISA Y18278 but human doesn’t?

Q96G27 WBP1 WW domain-binding protein 1 (WW domain appear to be signalling related to txn from polII ISS? https://www.uniprot.org/ uniprot/Q969T9)

A2RUH7 MYBPH Myosin-binding protein H-like (HAS PAINT IN UNIPROT BUT NOT IN QUICKGO)

O75638 CTAG2 Cancer/testis antigen 2 1:1 ISO-SGD:S000028512 EKC/KEOPS complex/tRNA threonylcarbamoyladenosine metabolic process should be able to PAINT entire family

Q69YN2 CWF19L1 CWF19-like protein 1 1:1 IS0-PomBase:SPAC30D11.09 splicing should be able to PAINT entire family

Q5TGZ0 MINOS1 MICOS complex subunit MIC10 IDA-PMID:22114354 MICOS complex ISO-SGD:S000007547- cristae formation

Q5SRH9 TTC39A Tetratricopeptide repeat protein 39A 1:1 ISO-SGD:S000003618 required for clearing of inclusion bodies (has cilium associated processes but likely to be more core role as universally conserved)

Q86U38 NOP9 Nucleolar protein 9 1:1-ISO-SGD:YJL010C

Q9UKJ5 CHIC2 Cysteine-rich hydrophobic domain-containing protein 2 ISO PomBase:SPAC3F10.07c protein-cysteine S-palmitoyltransferase activity GO:0061951 establishment of protein localization to plasma membrane

Q9H9Y4 GPN2 GPN-loop GTPase 2 based on description, could have PMID:23267056 ISS from Q08726 protein import into nucleus

O43934 MFSD11 UNC93-like protein MFSD11 MFS superfamily. TM Transporter/transport ??

P35544 Ubiquitin-like protein FUBI This protein is synthesized with ribosomal S30 as its C-terminal extension. This is the ubiquitin-ribosome fusion protein, equivalent of https://www.pombase.org/gene/SPAC11G7.04

IC

  • Q6UWQ7 IGFL2 Insulin growth factor-like family member 2 has signaling receptor binding (IC to signalling?)
  • Q6UXB1 IGFL3 Insulin growth factor-like family member 3 has signaling receptor binding (IC to signalling?)
  • Q9NPA0 EMC7 ER membrane protein complex subunit 7 https://www.uniprot.org/citations/22119785 https://www.uniprot.org/citations/22119785 (2011) (mitochondrion-endoplasmic reticulum membrane tethering IC from complex, or other)

@ These need adding to the master spreadsheet:

  • Q9BQJ4 TMEM47 Regulates cell junction organization in epithelial cells. PMID:26990309 (2016)
  • Q9P2V4 LRIT1 Regulates Selective Synapse Formation in Cone Photoreceptor Cells PMID:29590623 PMID:29590623 (2018)
  • Q9NRL3 STRN4 Striatin-4 PMID: 28442576 dendritic spine morphology?
  • Q5T0N1 CFAP70 Cilia- and flagella-associated protein 70 Localizes at the Base of the Outer Dynein Arm and Regulates Ciliary Motility.PMID:30158508 (2018)
  • O00160 MYO1F Unconventional myosin-If PMID:29487067 neutrophil migration PMID:29672841 mitochondrion distribution (2018)
  • Q7Z5L0 VMO1 Vitelline membrane outer layer protein 1 homolog PMID:25257056 (2014) stabilization of tear film?
  • Q9NPC6 MYOZ2 Myozenin-2 HAS myofibril assembly NEW/disappeared
  • Q86VF2 IGFN1 Immunoglobulin-like and fibronectin type III domain-containing protein 1 (PMID:29323771 required for myoblast fusion and differentiation.)
  • Q5T0D9 TPRG1L Tumor protein p63-regulated gene 1-like protein (MOVER) (PMID:26212709 negatively regulates synaptic release probability (2015))
  • Q96A28 SLAMF9 SLAM family member 9 (PMID:30232321 immune response)
  • Q9P0T7 TMEM9 Transmembrane protein 9 (PMID:30119033 endocytosis?)
  • Q1KMD3 HNRNPUL2 Heterogeneous nuclear ribonucleoprotein U-like protein 2 (PMID:22365830 DNA repair)
  • Q13823 GLN2 Nucleolar GTP-binding protein 2 (has ribosome biogenesis, & ND https://www.ebi.ac.uk/QuickGO/ annotations?geneProductId=Q13823 https://www.ebi.ac.uk/QuickGO/annotations?geneProductId=Q13823 but the ribosome biogenesis is not in AMiGO?
  • Q7Z4H8 KDELC2 KDEL motif-containing protein 2 two novel protein O-glucosyltransferases, POGLUT2/POGLUT3 (formerly KDELC1 and KDELC2), which transfer O-glucose (O-Glc) from UDP-Glc to serine 435. PMID:30127001 (2018 Sep)
  • Q00765 REEP5 Receptor expression-enhancing protein 5 (PMID:29431104 sarcoplasmic reticulum organization)
  • O94903 PLPBP Pyridoxal phosphate homeostasis protein vitamin B6 metabolic process (from Uniprot description)
  • Q4L180 FILIP1L Filamin A-interacting protein 1-like Filamin A interacting protein 1-like (FILIP1L) is an inhibitor of the canonical WNT pathway. PMID: 27776341
  • Q587J8 KHDC3L KHDC3-like protein embryonic development PMID:27917907 (2016)subcortical maternal complex
  • A6NGQ2 OOEP Oocyte-expressed protein homolog subcortical maternal complex (SCMC), plays an essential role for zygotes to progress beyond the first embryonic cell divisions. PMID: 29955025 By similarity homologous recombination-mediated DNA double-strand break repair in mouse oocytes; but maybe the role is still unclear? PMID:27525657

The appropriate term for these 2 might be ’sperm motility’?

  • Q8TBY8 PMFBP1 Polyamine-modulated factor 1-binding protein 1 (acephalic spermatozoa syndrome) The disruption of Pmfbp1 in male mice led to infertility due to the production of acephalic spermatozoa and the disruption of PMFBP1's cooperation with SUN5 and SPATA6, which plays a role in connecting sperm head to the tail.
  • Q9H0A9 SPATC1L Speriolin-like protein sperm head-tail integrity PMID:30026308 (2018)

If not enough for GO move up to unknown

  • Q6ZW05 PTCHD4 Patched domain-containing protein 4 PMID:25296753 -ve reg of HH signalling (not sure if this is enough for GO)
  • Q8N2Y8 RUSC2 Iporin PMID: 27633991 -ve reg of Hedgehog signalling (not sure if this is enough for GO)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/pombase/curation/issues/2192#issuecomment-424357835, or mute the thread https://github.com/notifications/unsubscribe-auth/AMI00r87-Nuwhqz7g2axe5vd9j6IStIuks5uejjkgaJpZM4W4hLn .

-- Antonia Lock, PhD PomBase Biocurator, http://www.pombase.org Department of Genetics, Evolution and Environment, The Darwin Building, University College London London WC1E 6BT, UK

ValWood commented 5 years ago

For the 25 experimental ones I did not go to the papers, I did them from abstracts. I won't have time so I'll pass that back to you. We'll just have to trust the numbers for the paper, it does not matter if they don't get done immediately. It's probably not a big deal if we don't do these since they are recent papers they will likely be in the pipeline anyway.....

I won't get time this week or next week. I need to move back onto other stuff (PHI base, overhaul lecture, thesis)

Antonialock commented 5 years ago

so what do we do for the numbers? Do we remove the number we have annotated from the unknown number?

On Tue, Sep 25, 2018 at 6:33 PM, Val Wood notifications@github.com wrote:

For the 25 experimental ones I did not go to the papers, I did them from abstracts. I won't have time so I'll pass that back to you. We'll just have to trust the numbers for the paper, it does not matter if they don't get done immediately. It's probably not a big deal if we don't do these since they are recent papers they will likely be in the pipeline anyway.....

I won't get time this week or next week. I need to move back onto other stuff (PHI base, overhaul lecture, thesis) and I need a break.....I haven't had any time off this year, only half a week to work on thesis....

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pombase/curation/issues/2192#issuecomment-424433116, or mute the thread https://github.com/notifications/unsubscribe-auth/AMI00plRcICkegu2kba80qKu_qXYDqA_ks5uemjygaJpZM4W4hLn .

-- Antonia Lock, PhD PomBase Biocurator, http://www.pombase.org Department of Genetics, Evolution and Environment, The Darwin Building, University College London London WC1E 6BT, UK

ValWood commented 5 years ago

Yep. In the description of the file (SUPP) there will be a list basically, we will call these "annotatbale".

Ah I see what you mean they will need to be in the table because we are quoting the numbers in the text.

If you do get to these we can include them, but if not ,we will just keep them all as "ND" and ignore the fact that a small number could be annotated. The numbers are tiny in the whole scheme of things. I wasn't even planning to look at the ND originally, and I wasted a day looking at them already. I think if Uniprot calls them ND, we can't really be pulled up for classifying them as "unknowns"- so I don't think we should worry about these too much. Unannotated were a bit different. If they aren't annotated it's more difficult to say unequivocally that they are unknown. Does that make sense?

ValWood commented 5 years ago

I so hate the end of projects. We need to finish but we also need to 80:20 not 100:0 Ask Tony!

ValWood commented 5 years ago

but I think we are 97:3 on this one ;)

ValWood commented 5 years ago

so what do we do for the numbers? Do we remove the number we have annotated from the unknown number?

...to answer your original question, yes.

Whatever ends up in the "GAF" (your annotation) we will call "annotatable" and that will be a separate segment on in the histogram. I think we should make it clear that these are/were not currently annotated in GO, but they can and are being annotated now.

In a way it's better not to have the NDs in here, as that would be more confusing.

Antonialock commented 5 years ago

I suppose I should look at the 500 odd that dont slim though

On Tue, Sep 25, 2018 at 7:36 PM, Val Wood notifications@github.com wrote:

so what do we do for the numbers? Do we remove the number we have annotated from the unknown number?

...to answer your original question, yes.

Whatever ends up in the "GAF" (your annotation) we will call "annotatable" and that will be a separate segment on in the histogram. I think we should make it clear that these are/were not currently annotated in GO, but they can and are being annotated now.

In a way it's better not to have the NDs in here, as that would be more confusing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pombase/curation/issues/2192#issuecomment-424454344, or mute the thread https://github.com/notifications/unsubscribe-auth/AMI00lMiLjpw9WIbpQYJ2oVY5XV-l0Qeks5uenewgaJpZM4W4hLn .

-- Antonia Lock, PhD PomBase Biocurator, http://www.pombase.org Department of Genetics, Evolution and Environment, The Darwin Building, University College London London WC1E 6BT, UK

ValWood commented 5 years ago

Nooooo, we have done enough already..... we need to stop now. I can see your point but we really don't have time before submission.

We could look at the products and pick out any which clearly should slim (which is what I originally intended for the unannotated, but what you have done is MUCH better, and really needed). However, for now I think we should sit on this. If reviewers pull us up on it we can do this set more thoroughly (unlikely).

If we really, really need to do it later we will. All we need to do, is say exactly what we did, which is "manually annotate the unnanotated"

Antonialock commented 5 years ago

How about using the slim analysis numbers for the histogram, and then do a pie chart for the 'unknowns' separating them into ND (assigned by uniprot curators), annotatable, ND (assigned by us) and 'annotations to non-root'

On Tue, Sep 25, 2018 at 9:16 PM, Val Wood notifications@github.com wrote:

We don't know.....that's what I keep saying. It might be this week.... it depends on the response to the enquiry. If we get a yes, it needs to be submitted immediately.

But regardless, we haven't got the resources to keep annotating human genes. We need to close this off this week even if we don't need to submit this week.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pombase/curation/issues/2192#issuecomment-424486100, or mute the thread https://github.com/notifications/unsubscribe-auth/AMI00undUgsxjoVyLvnxJ-TNxnH37t-Bks5ueo8ugaJpZM4W4hLn .

-- Antonia Lock, PhD PomBase Biocurator, http://www.pombase.org Department of Genetics, Evolution and Environment, The Darwin Building, University College London London WC1E 6BT, UK

ValWood commented 5 years ago

We don't need another figure (and it would be largely redundant with the histogram). We can cover this in the histogram with: known, annotateable, unknown this will be fine. We've paid more attention to detail than anyone else ever has, and it isn't our job to annotate these. We could have just "said what we see" if it wasn't for the fact that there were so many clearly annotateble gene products in the unknowns.

It won't be a big issue if we do end up with some known in the unknown ....it's a moving goalpost after all so it is inevitable. The point is that we are now ballpark correct with the numbers. The 500 ->known to annotateable will fix this. Other people can do the rest.....

ValWood commented 5 years ago

@Antonialock if you need to take any "annotatable" out because they are from cancer cell lines, you can make the numbers the same by adding some of the 48 I identified above. These are the ones I REALLY wanted you to annotate.... I wasn't so bothered about delving into the literature. I just didn't want reviewers to be able to look at the list of gene products and have proteins which were well-known biological role screaming out of the list. As it stands, these will still be included...

Antonialock commented 5 years ago

Ok, I didn't look at the NDs because, well, they were marked as ND...

ValWood commented 5 years ago

Good point . Any with a "vague" product line we can ignore. I'm just worried about those with obvious descriptions, like "O94903 PLPBP Pyridoxal phosphate homeostasis protein vitamin B6 metabolic process (from UniProt description)"

but is's a very small number in total that will be included. We could add a sentence to explain this in review. There are going to be a few differences due to time lags between annotation and release....

ValWood commented 5 years ago

I think this one can close?