Closed ValWood closed 6 years ago
These 2 terms which were not slimmed, I have a feeling are transposon derived? I thought these were filtered? @Antonialock Could you look into this ?
DNA integration Q9QC07, Q9NXP7, Q9P2P1 DNA biosynthetic process P63128, P04053, Q6UWI2, Q9QC07 nucleic acid-templated transcription C9JCN9, Q3ZLR7, Q9UHA2
If so is there a way to filter (i.e Uniprot need to update their gene set?)
No I'm not happy to argue against transposon annotations. They have been shown to play a role in human biology. I included them in my slim at first and then you asked me to remove the terms.
Also I had added defense response, which you told me tor emove :-) Which one is it?
Why do you think " inositol phosphate metabolic process" is a "biologically informative term"? It could be part of both signaling and macromolecule metabolism, which are very different things?
No I'm not happy to argue against transposon annotations.
This isn't about "transposon annotation" but about including transposon encoding "genes", in the human protein set. In the pombe protein set we are excluding transposons In the S. cerevisiae dataset we are excluding transpons We should therefore exclude transposons from the human set. (this makes sense, because different organisms have different numbers of transposons and this can grossly affect the comparisons)
They are relevant to an organisms biology (they are clearly important for evolution), but in this instance we are trying to compare like-for-like and that excludes "transposons". If we don't do this "normalization", we have over-inflated numbers for processes which are shared with transposons.
These UniProt entries appeared to be transposon derived (I could not quite work out what they were). Perhaps they aren't, it would be good to check.
Also I had added defense response, which you told me tor emove :-) Which one is it?
My fault. I can't remember that, but I probably though immune response would cover it, I didn't know it was a completely separate thing!
Why do you think " inositol phosphate metabolic process" is a "biologically informative term"?
This one is a bit subtle. The genes which are involved in IP metabolism could be involved in either/or signaling and macromolecule metabolism, but they are well characterised genes...
I think people would find it odd to see inositol-1,5-bisdiphosphate-2,3,4,6-tetrakisphosphate 1-diphosphatase activity for example (this might not be a good example), classified as an unknown?
...they probably always do both, depending on context? so although this annotation is not very specific, I think we have to say that we know enough here to know that anything annotated to this particular term is "known"?
This is not the same as for- say "cell growth" or "cell proliferation" or even any of the "modification" terms we omitted, which are always "context dependent". I think the distinction is that, say for a protein kinase, any individual instance could be involved in any specific process (so it is not specific), but for these "IP metabolism" activities they are involved in both metabolism and signalling.
i.e for the IP molecules the gene products are multifunctional...I think that's the difference...does that make sense?
This isn't about "transposon annotation" it is about including transposon encoding "genes", in the human protein set.
To clarify: I'm not saying that UniProt should not include transposons, but that it should be possible to access a protein set that excludes them. I thought that was what we were using? The above genes might have slipped through. Otherwise they seem to be annotated as if they are transposons, so this needs to be queried ( or, that was my initial interpretation from the descriptions, and the GO annotation).
A reason for excluding transposons: https://github.com/geneontology/go-annotation/issues/1869 look at the second table. I think the numbers for DNA recombination should be equivalent based on endogenous proteins. If you include transposons you would not see this because all transposons are annotated to "DNA recombination" because of their integrase. That is probalby why human "recombination " is higher (I didn't look into it yet, it's a guess).
There are likely to be other annotations to transposons that we would want to exclude to normalize numbers of annotations to processes.
This has nothing to do with the fact that they affect human biology, it just isn't what we are trying to show which is basically annotation coverage for non-transposon proteins.
My fault. I can't remember that, but I probably though immune response would cover it, I didn't know it was a completely separate thing!
In fact I started writing a really long SF ticket about this until I realised "oh right they aren't necessarily related".....
These UniProt entries appeared to be transposon derived (I could not quite work out what they were). Perhaps they aren't, it would be good to check.
My other suspicion was that these were not "transposon derived" but were incorrectly annotated because they were something else. I hope you will see the same when you look at them....
I will leave defense response out, and recommend this as not for direct annotation! I see you included children, it is non-specific!
Currently
5915 genes 217 not annotated in the slim. I have been through this list and I am satisfied that most really are no process (all are modification, response to or oxidation-reduction process). In fact most of these (over 100) have ND for process at SGD. no none root annotation 794 using our criteria these are ALL unknown (1011/5915)
pombe Your input list contains 5070 gene
These 9 identifiers were not annotated in the slim, but they had non-root annotations that were not in the slim: SPAC6G9.13c SPBC2G2.09c SPBP4G3.02 SPBC20F10.10 SPBC1347.11 SPAC2G11.15c SPCC1795.09 SPAC1002.07c SPBP4H10.17c (looking into this but all are known)
These 734 identifiers had no non-root annotations: so for the sake of argument I’ll include the 9 unmapped in unknown since I did this for S.c so for pombe the equivalent is 743/5070 (this will go down with the next GO update, and when I include C-terminal protein lipidation)
I will do the final update next week for all 3....
So @Antonialock your final task for this part:
nucleic acid-templated transcription C9JCN9, Q3ZLR7, Q9UHA2 FP inference from "transcription activator/cofactor/repressor activity" no viruses annotated to this MF...but I guess there could be viral proteins that acts as transcriptional cofactors?
PO4053 DNA biosynthesis looks ok, it does nontemplated addition of nucleotides to exons (immune system development)
Q9P2P1- no idea about this one "The gene encoding this protein may have arisen from the fusion of a cellular gene with retroviral sequences prior to the marsupial-eutherian split. Sequence and structural analyses suggest that the integrase catalytic domain is inactive."
rest look like viral stuff, can look more in detail tomorrow and try and figure out how many are in the complete list.
So @Antonialock your final task for this part:
document here which human set was used, and check up on the transposon issue (are they in the dataset or out, they should be excluded for our purposes?)
The human list I used is still the uniprot curated 1:1 list of human genes that are "believed to exist" retrieved by using the search: NOT existence:uncertain AND reviewed:yes AND organism:"Homo sapiens (Human) [9606]" AND proteome:up000005640 in UniProt
so there are 88 genes that contain "retrovir" in the protein name, and 28 that contain "retrotransp" in the protein name = 0.6%
so if there are fewer than 100 transposons in there I guess we can ignore them.... However, if we want to do "comparative slims" we should exclude them, as they will "over-inflate" annotations to transposon related processes (like recombination).
Don't UniProt provide a list without transposons? I thought when you asked on that GO thread that ex transposons was one of the options?
I agree the ones you checked look OK....why did I flag those? not sure...these are clearly endogenous genes
ok I deleted 71 genes
I kept the ones with description "Retrotransposon Gag-like protein" because they are described as being derived from retrotransposon but now have actual functions e.g. see Q5HYW3, and some others that seemed to be gene fusions etc with retrotranposons but now are "everyday" functional things
Deleted:
Q9UN81 | LORF1_HUMAN | LINE-1 retrotransposable element ORF1 protein (L1ORF1p) (LINE retrotransposable element 1) (LINE1 retrotransposable element 1) | L1RE1 LRE1 |
---|---|---|---|
O00370 | LORF2_HUMAN | LINE-1 retrotransposable element ORF2 protein (ORF2p) [Includes: Reverse transcriptase (EC 2.7.7.49); Endonuclease (EC 3.1.21.-)] | |
Q5T7N2 | LITD1_HUMAN | LINE-1 type transposase domain-containing protein 1 (ES cell-associated protein 11) | L1TD1 ECAT11 |
Q9NXP7 | GIN1_HUMAN | Gypsy retrotransposon integrase-like protein 1 (GIN-1) (Ty3/Gypsy integrase 1) (Zinc finger H2C2 domain-containing protein) | GIN1 TGIN1 ZH2C2 |
P0CF75 | EBLN1_HUMAN | Endogenous Bornavirus-like nucleoprotein 1 (Endogenous Borna-like N element-1) (EBLN-1) | EBLN1 |
Q6P2I7 | EBLN2_HUMAN | Endogenous Bornavirus-like nucleoprotein 2 (Endogenous Borna-like N element-2) (EBLN-2) | EBLN2 GK006 |
Q14264 | ENR1_HUMAN | Endogenous retrovirus group 3 member 1 Env polyprotein (ERV-3 envelope protein) (ERV3 envelope protein) (ERV3-1 envelope protein) (Envelope polyprotein) (HERV-R envelope protein) (ERV-R envelope protein) (HERV-R_7q21.2 provirus ancestral Env polyprotein) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERV3-1 ERV3 |
P60507 | EFC1_HUMAN | Endogenous retrovirus group FC1 Env polyprotein (Envelope polyprotein) (Fc1env) (HERV-F(c)1_Xq21.33 provirus ancestral Env polyprotein) (HERV-Fc1env) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERVFC1 |
P60608 | EFC2_HUMAN | Endogenous retrovirus group FC1 member 1 Env polyprotein (Envelope polyprotein) (Fc2deltaenv) (HERV-F(c)2_7q36.2 provirus ancestral Env polyprotein) [Includes: Surface protein (SU); Truncated transmembrane protein (TM)] | ERVFC1-1 |
P87889 | GAK10_HUMAN | Endogenous retrovirus group K member 10 Gag polyprotein (HERV-K10 Gag protein) (HERV-K107 Gag protein) (HERV-K_5q33.3 provirus ancestral Gag polyprotein) (Gag polyprotein) | ERVK-10 |
P61580 | NP10_HUMAN | Endogenous retrovirus group K member 10 Np9 protein (HERV-K10 Np9 protein) (HERV-K107 Np9 protein) (HERV-K_5q33.3 provirus Np9 protein) | ERVK-10 |
P10266 | POK10_HUMAN | Endogenous retrovirus group K member 10 Pol protein (HERV-K10 Pol protein) (HERV-K107 Pol protein) (HERV-K_5q33.3 provirus ancestral Pol protein) [Includes: Reverse transcriptase (RT) (EC 2.7.7.49); Ribonuclease H (RNase H) (EC 3.1.26.4); Integrase (IN)] | ERVK-10 |
P10265 | VPK10_HUMAN | Endogenous retrovirus group K member 10 Pro protein (HERV-K10 Pro protein) (HERV-K107 Pro protein) (HERV-K_5q33.3 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | ERVK-10 |
P63124 | VPK04_HUMAN | Endogenous retrovirus group K member 104 Pro protein (HERV-K104 Pro protein) (HERV-K_5q13.3 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | HERV-K104 |
P61576 | REC04_HUMAN | Endogenous retrovirus group K member 104 Rec protein (HERV-K104 Rec protein) (HERV-K_5q13.3 provirus Rec protein) | HERV-K104 |
Q9UQG0 | POK11_HUMAN | Endogenous retrovirus group K member 11 Pol protein (HERV-K_3q27.2 provirus ancestral Pol protein) [Includes: Reverse transcriptase (RT) (EC 2.7.7.49); Ribonuclease H (RNase H) (EC 3.1.26.4); Integrase (IN)] | ERVK-11 |
Q902F9 | EN113_HUMAN | Endogenous retrovirus group K member 113 Env polyprotein (EnvK5 protein) (Envelope polyprotein) (HERV-K113 envelope protein) (HERV-K_19p13.11 provirus ancestral Env polyprotein) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | HERVK_113 |
P62684 | GA113_HUMAN | Endogenous retrovirus group K member 113 Gag polyprotein (HERV-K113 Gag protein) (HERV-K_19p13.11 provirus ancestral Gag polyprotein) (Gag polyprotein) | HERVK_113 |
P63132 | PO113_HUMAN | Endogenous retrovirus group K member 113 Pol protein (HERV-K113 Pol protein) (HERV-K_19p13.11 provirus ancestral Pol protein) [Includes: Reverse transcriptase (RT) (EC 2.7.7.49); Ribonuclease H (RNase H) (EC 3.1.26.4); Integrase (IN)] | HERVK_113 |
P63121 | VP113_HUMAN | Endogenous retrovirus group K member 113 Pro protein (HERV-K113 envelope protein) (HERV-K_19p13.11 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | HERVK_113 |
P61574 | RE113_HUMAN | Endogenous retrovirus group K member 113 Rec protein (HERV-K113 Rec protein) (HERV-K_19p13.11 provirus Rec protein) | HERVK_113 |
Q9NX77 | ENK13_HUMAN | Endogenous retrovirus group K member 13-1 Env polyprotein (Envelope polyprotein) (HERV-K_16p13.3 provirus ancestral Env polyprotein) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERVK13-1 |
P61578 | REC16_HUMAN | Endogenous retrovirus group K member 16 Rec protein (HERV-K_10p14 provirus Rec protein) | ERVK-16 |
O42043 | ENK18_HUMAN | Endogenous retrovirus group K member 18 Env polyprotein (Envelope polyprotein) (HERV-K(C1a) envelope protein) (HERV-K110 envelope protein) (HERV-K18 envelope protein) (HERV-K18 superantigen) (HERV-K_1q23.3 provirus ancestral Env polyprotein) (IDDMK1,2 22 envelope protein) (IDDMK1,2 22 superantigen) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERVK-18 |
Q9QC07 | POK18_HUMAN | Endogenous retrovirus group K member 18 Pol protein (HERV-K(C1a) Pol protein) (HERV-K110 Pol protein) (HERV-K18 Pol protein) (HERV-K_1q23.3 provirus ancestral Pol protein) [Includes: Reverse transcriptase (EC 2.7.7.49); Ribonuclease H (RNase H) (EC 3.1.26.4)] | ERVK-18 |
P63123 | VPK18_HUMAN | Endogenous retrovirus group K member 18 Pro protein (HERV-K(C1a) Pro protein) (HERV-K110 Pro protein) (HERV-K18 Pro protein) (HERV-K_1q23.3 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | ERVK-18 |
O71037 | ENK19_HUMAN | Endogenous retrovirus group K member 19 Env polyprotein (EnvK3 protein) (Envelope polyprotein) (HERV-K(C19) envelope protein) (HERV-K_19q11 provirus ancestral Env polyprotein) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERVK-19 |
Q9YNA8 | GAK19_HUMAN | Endogenous retrovirus group K member 19 Gag polyprotein (HERV-K(C19) Gag protein) (HERV-K_19q11 provirus ancestral Gag polyprotein) (Gag polyprotein) | ERVK-19 |
Q9WJR5 | POK19_HUMAN | Endogenous retrovirus group K member 19 Pol protein (HERV-K(C19) Pol protein) (HERV-K_19q11 provirus ancestral Pol protein) [Includes: Reverse transcriptase (RT) (EC 2.7.7.49); Ribonuclease H (RNase H) (EC 3.1.26.4); Integrase (IN)] | ERVK-19 |
P63120 | VPK19_HUMAN | Endogenous retrovirus group K member 19 Pro protein (HERV-K(C19) Pro protein) (HERV-K_19q12 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | ERVK-19 |
P61572 | REC19_HUMAN | Endogenous retrovirus group K member 19 Rec protein (HERV-K(C19) Rec protein) (HERV-K_19q11 provirus Rec protein) | ERVK-19 |
P61565 | ENK21_HUMAN | Endogenous retrovirus group K member 21 Env polyprotein (EnvK1 protein) (Envelope polyprotein) (HERV-K_12q14.1 provirus ancestral Env polyprotein) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERVK-21 |
P62683 | GAK21_HUMAN | Endogenous retrovirus group K member 21 Gag polyprotein (HERV-K_12q14.1 provirus ancestral Gag polyprotein) (Gag polyprotein) | ERVK-21 |
P63119 | VPK21_HUMAN | Endogenous retrovirus group K member 21 Pro protein (HERV-K_12q14.1 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | ERVK-21 |
P61571 | REC21_HUMAN | Endogenous retrovirus group K member 21 Rec protein (HERV-K_12q14.1 provirus Rec protein) | ERVK-21 |
P61566 | ENK24_HUMAN | Endogenous retrovirus group K member 24 Env polyprotein (Envelope polyprotein) (HERV-K101 envelope protein) (HERV-K_22q11.21 provirus ancestral Env polyprotein) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERVK-24 |
P63145 | GAK24_HUMAN | Endogenous retrovirus group K member 24 Gag polyprotein (HERV-K101 Gag protein) (HERV-K_22q11.21 provirus ancestral Gag polyprotein) (Gag polyprotein) | ERVK-24 |
P61581 | NP24_HUMAN | Endogenous retrovirus group K member 24 Np9 protein (HERV-K101 Np9 protein) (HERV-K_22q11.21 provirus Np9 protein) | ERVK-24 |
P63129 | VPK24_HUMAN | Endogenous retrovirus group K member 24 Pro protein (HERV-K101 envelope protein) (HERV-K_22q11.21 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | ERVK-24 |
P61570 | ENK25_HUMAN | Endogenous retrovirus group K member 25 Env polyprotein (Envelope polyprotein) (HERV-K_11q22.1 provirus ancestral Env polyprotein) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERVK-25 |
P63136 | POK25_HUMAN | Endogenous retrovirus group K member 25 Pol protein (HERV-K_11q22.1 provirus ancestral Pol protein) [Includes: Reverse transcriptase (RT) (EC 2.7.7.49); Ribonuclease H (RNase H) (EC 3.1.26.4); Integrase (IN)] | ERVK-25 |
P63125 | VPK25_HUMAN | Endogenous retrovirus group K member 25 Pro protein (HERV-K_11q22.1 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | ERVK-25 |
P61579 | ERK25_HUMAN | Endogenous retrovirus group K member 25 Rec protein (Endogenous retrovirus group K member 25) (HERV-K_11q22.1 provirus Rec protein) | ERVK-25 |
Q9HDB8 | ENK5_HUMAN | Endogenous retrovirus group K member 5 Env polyprotein (Envelope polyprotein) (HERV-K(II) envelope protein) (HERV-K_3q12.3 provirus ancestral Env polyprotein) [Includes: Truncated surface protein (SU)] | ERVK-5 ERVK5 |
Q9HDB9 | GAK5_HUMAN | Endogenous retrovirus group K member 5 Gag polyprotein (HERV-K(II) Gag protein) (HERV-K_3q12.3 provirus ancestral Gag polyprotein) (Gag polyprotein) | ERVK-5 ERVK5 |
P61583 | NP5_HUMAN | Endogenous retrovirus group K member 5 Np9 protein (Endogenous retrovirus K protein 5) (HERV-K(II) Np9 protein) (HERV-K_3q12.3 provirus Np9 protein) | ERVK-5 ERVK5 |
Q69384 | ENK6_HUMAN | Endogenous retrovirus group K member 6 Env polyprotein (EnvK2 protein) (Envelope polyprotein) (HERV-K(C7) envelope protein) (HERV-K(HML-2.HOM) envelope protein) (HERV-K108 envelope protein) (HERV-K_7p22.1 provirus ancestral Env polyprotein) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERVK-6 ERVK6 |
Q7LDI9 | GAK6_HUMAN | Endogenous retrovirus group K member 6 Gag polyprotein (HERV-K(C7) Gag protein) (HERV-K(HML-2.HOM) Gag protein) (HERV-K108 Gag protein) (HERV-K_7p22.1 provirus ancestral Gag polyprotein) (Gag polyprotein) | ERVK-6 ERVK6 |
Q9BXR3 | POK6_HUMAN | Endogenous retrovirus group K member 6 Pol protein (HERV-K(C7) Pol protein) (HERV-K(HML-2.HOM) Pol protein) (HERV-K108 Pol protein) (HERV-K_7p22.1 provirus ancestral Pol protein) [Includes: Reverse transcriptase (RT) (EC 2.7.7.49); Ribonuclease H (RNase H) (EC 3.1.26.4); Integrase (IN)] | ERVK-6 ERVK6 |
Q9Y6I0 | VPK6_HUMAN | Endogenous retrovirus group K member 6 Pro protein (HERV-K(C7) Pro protein) (HERV-K(HML-2.HOM) Pro protein) (HERV-K108 Pro protein) (HERV-K_7p22.1 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | ERVK-6 ERVK6 |
Q69383 | REC6_HUMAN | Endogenous retrovirus group K member 6 Rec protein (Central open reading frame) (c-orf) (cORF) (Endogenous retrovirus K protein 6) (HERV-K(C7) Rec protein) (HERV-K(HML-2.HOM) Rec protein) (HERV-K108 Rec protein) (HERV-K_7p22.1 provirus Rec protein) (K-Rev) (Rev-like protein) (Rev/Rex homolog) | ERVK-6 ERVK6 |
P61567 | ENK7_HUMAN | Endogenous retrovirus group K member 7 Env polyprotein (Envelope polyprotein) (HERV-K(III) envelope protein) (HERV-K102 envelope protein) (HERV-K_1q22 provirus ancestral Env polyprotein) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERVK-7 |
P63130 | GAK7_HUMAN | Endogenous retrovirus group K member 7 Gag polyprotein (HERV-K(III) Gag protein) (HERV-K102 Gag protein) (HERV-K_1q22 provirus ancestral Gag polyprotein) (Gag polyprotein) | ERVK-7 |
P61582 | NP7_HUMAN | Endogenous retrovirus group K member 7 Np9 protein (HERV-K(III) Np9 protein) (HERV-K102 Np9 protein) (HERV-K_1q22 provirus Np9 protein) | ERVK-7 |
P63135 | POK7_HUMAN | Endogenous retrovirus group K member 7 Pol protein (HERV-K(III) Pol protein) (HERV-K102 Pol protein) (HERV-K_1q22 provirus ancestral Pol protein) [Includes: Reverse transcriptase (RT) (EC 2.7.7.49); Ribonuclease H (RNase H) (EC 3.1.26.4); Integrase (IN)] | ERVK-7 |
P63131 | VPK7_HUMAN | Endogenous retrovirus group K member 7 Pro protein (HERV-K(III) Pro protein) (HERV-K102 Pro protein) (HERV-K_1q22 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | ERVK-7 |
Q902F8 | ENK8_HUMAN | Endogenous retrovirus group K member 8 Env polyprotein (EnvK6 protein) (Envelope polyprotein) (HERV-K115 envelope protein) (HERV-K_8p23.1 provirus ancestral Env polyprotein) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERVK-8 |
P62685 | GAK8_HUMAN | Endogenous retrovirus group K member 8 Gag polyprotein (HERV-K115 Gag protein) (HERV-K_8p23.1 provirus ancestral Gag polyprotein) (Gag polyprotein) | ERVK-8 |
P63133 | POK8_HUMAN | Endogenous retrovirus group K member 8 Pol protein (HERV-K115 Pol protein) (HERV-K_8p23.1 provirus ancestral Pol protein) [Includes: Reverse transcriptase (RT) (EC 2.7.7.49); Ribonuclease H (RNase H) (EC 3.1.26.4); Integrase (IN)] | ERVK-8 |
P63122 | VPK8_HUMAN | Endogenous retrovirus group K member 8 Pro protein (HERV-K115 Pro protein) (HERV-K_8p23.1 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | ERVK-8 |
P61575 | RECK8_HUMAN | Endogenous retrovirus group K member 8 Rec protein (HERV-K115 Rec protein) (HERV-K_8p23.1 provirus Rec protein) | ERVK-8 |
Q9UKH3 | ENK9_HUMAN | Endogenous retrovirus group K member 9 Env polyprotein (EnvK4 protein) (Envelope polyprotein) (HERV-K(C6) envelope protein) (HERV-K109 envelope protein) (HERV-K_6q14.1 provirus ancestral Env polyprotein) [Cleaved into: Surface protein (SU); Transmembrane protein (TM)] | ERVK-9 |
P63126 | GAK9_HUMAN | Endogenous retrovirus group K member 9 Gag polyprotein (HERV-K(C6) Gag protein) (HERV-K109 Gag protein) (HERV-K_6q14.1 provirus ancestral Gag polyprotein) (Gag polyprotein) | ERVK-9 |
P63128 | POK9_HUMAN | Endogenous retrovirus group K member 9 Pol protein (HERV-K(C6) Gag-Pol protein) (HERV-K109 Gag-Pol protein) (HERV-K_6q14.1 provirus ancestral Gag-Pol polyprotein) [Includes: Protease (EC 3.4.23.50) (PR) (Retropepsin); Reverse transcriptase/ribonuclease H (EC 2.7.7.49) (EC 2.7.7.7) (EC 3.1.26.4) (p66 RT)] | ERVK-9 |
P63127 | VPK9_HUMAN | Endogenous retrovirus group K member 9 Pro protein (HERV-K(C6) Pro protein) (HERV-K109 Pro protein) (HERV-K_6q14.1 provirus ancestral Pro protein) (EC 3.4.23.50) (Protease) (Proteinase) (PR) | ERVK-9 |
P61573 | REC9_HUMAN | Endogenous retrovirus group K member 9 Rec protein (HERV-K(C6) Rec protein) (HERV-K109 Rec protein) (HERV-K_6q14.1 provirus Rec protein) | ERVK-9 |
Q9H9K5 | MER34_HUMAN | Endogenous retrovirus group MER34 member 1 Env polyprotein (HERV-MER_4q12 provirus ancestral Env polyprotein) | ERVMER34-1 LP9056 |
P60509 | ERB1_HUMAN | Endogenous retrovirus group PABLB member 1 Env polyprotein (Endogenous retrovirus group PABLB member 1) (Envelope polyprotein) (HERV-R(b) Env protein) (HERV-R(b)_3p24.3 provirus ancestral Env polyprotein) [Includes: Surface protein domain (SU); Transmembrane protein domain (TM)] | ERVPABLB-1 |
P61550 | ENVT1_HUMAN | Endogenous retrovirus group S71 member 1 Env polyprotein (Envelope polyprotein) (HERV-T Env protein) (HERV-T_19q13.11 provirus ancestral Env polyprotein) [Includes: Surface protein (SU); Transmembrane protein (TM)] | ERVS71-1 |
B6SEH8 | ERVV1_HUMAN | Endogenous retrovirus group V member 1 Env polyprotein (HERV-V_19q13.41 provirus ancestral Env polyprotein 1) | ERVV-1 ENVV1 |
B6SEH9 | ERVV2_HUMAN | Endogenous retrovirus group V member 2 Env polyprotein (HERV-V_19q13.41 provirus ancestral Env polyprotein 2) | ERVV-2 ENVV2 |
@ValWood should these ones also go? e.g. http://www.uniprot.org/uniprot/Q96MW7
how about http://www.uniprot.org/uniprot/Q9P215 ?
and http://www.uniprot.org/uniprot/Q6P3X8
(there are a few of these types)
yes ideally we should exclude transposon derived, but not those which have evolved_from transposons. if this is too tricky, we can leave them in...
especially if you are doing this manually, because you will only be deleting the unknown ones (I guess), but not the slimmed ones. Ideally ask UniProt if there is a list of human proteins which excludes transposons (presumably they maintain this list? , one would hope?)
uniprot suggested removing anything annotated to http://www.uniprot.org/keywords/KW-0814
That removes 71 entries Most look fine (e.g. the "Endogenous retrovirus group K member...)
however these are also removed, do you want them in? http://www.uniprot.org/uniprot/Q9UQF0 http://www.uniprot.org/uniprot/P60508 http://www.uniprot.org/uniprot/M5A8F1
it does NOT remove these 13 genes (include or exclude? O00370 P0CF75 Q17RP2 Q4W5G0 Q53EQ6 Q5T7N2 Q6B0B8 Q6NT04 Q6P2I7 Q8IY51 Q96MW7 Q9NXP7 Q9UN81
You can see the full list here: http://www.uniprot.org/uniprot/?query=NOT+existence%3Auncertain+AND+keyword%3A%22Transposable+element+%5BKW-0814%5D%22+AND+reviewed%3Ayes+AND+organism%3A%22Homo+sapiens+%28Human%29+%5B9606%5D%22+AND+proteome%3Aup000005640&sort=score
I wonder of these are called things like "endogenous coat protein family" because they exist in virus also (but the virus is mirroring the human proteins, the naming is unfortunate!) OR if they are really the actual retroviral component....difficult to know unless the annotation is clear. I would ask UniProt how to get the definitive list (if it is possible), and point out any inconsistencies using the current method.........
But for our current purposes it does not matter so much, I think just use the filter they suggested...but ask the question for future refinement.....may as well get the ball rolling in the right direction...
how are you getting on with final update. do you need to wait for a fix?
I need to wait until the annotation set is through to GO term mapper. I didn't have chance to check yet but I will do so before the end of this week.
which ticket are the current figures in? I can't find them?
yep I realised it was in the closed ticket. I was looking at open...
OK,
pombe 5070 genes 1 ambiguous 16 identifiers were not annotated in the slim, but they had non-root annotations that were not in the slim (class as unknown for our purposes, some are from PAINT) 662 identifiers had no non-root annotation (this is because PAINT maps some I guess), so its a bit lower, will check this
cerevisiae 5915 genes 1 identifiers were found to be unannotated: YCL054W-A (reported to SGD) 168 identifiers were not annotated in the slim, but they had non-root annotations that were not in the slim (I checked these over 3 times now, they are all "unknown process", eithe functions or PAINT issues) 765 identifiers had no non-root annotations
All using the same slim, using GO term mapper today I will put the slim, and the gene sets, and the term mapper outputs in a Google Docs folder.
human 19700 genes 2771 identifiers were found to be unannotated 614 identifiers were not annotated in the slim, but they had non-root annotations that were not in the slim (I think we are happy that these are all really 'process unknown' from previous checks? (phosphorylation, response to and the like) 219 identifiers had no non-root annotations
also you wrote human 19700 genes - should be 19690
NOT existence:uncertain NOT keyword:"Transposable element [KW-0814]" AND reviewed:yes AND organism:"Homo sapiens (Human) [9606]" AND proteome:up000005640
I'm confused by there being 19700 human genes. That's not what you get when retrieving genes with the filter NOT existence:uncertain NOT keyword:"Transposable element [KW-0814]" AND reviewed:yes AND organism:"Homo sapiens (Human) [9606]" AND proteome:up000005640
I tried to rerun the slim but the tool isn't working?
Weren't there 19700 in the list you gave me?
I also need to wait to get the submitted gaf without the PAINT data from Midori. You can't filter the slim by evidence code...
but if you are happy to go with your numbers we can.
could you clarify the number of known / unknown (i dont understand what ambiguous means?) is it pombe unknowns 1+16+662 = 679 cerevisiae unknowns 1+168+765 = 934 human 2771 + 614 + 219 = 3604 ?
should have been 19690 human genes NOT existence:uncertain NOT keyword:"Transposable element [KW-0814]" AND reviewed:yes AND organism:"Homo sapiens (Human) [9606]" AND proteome:up000005640
The human list I have has 19730.
I'm putting everything in the Google drive directory....
ok well then I don't know what is in your list, if you want the genes included in the human proteome excluding transposons you should have a list of 19690 (which you get if you search uniprot for NOT existence:uncertain NOT keyword:"Transposable element [KW-0814]" AND reviewed:yes AND organism:"Homo sapiens (Human) [9606]" AND proteome:up000005640)
48 comments on this ne, closing. I'll open a new ticket for final stuff
I didn't open the final ticket- I'm doing that now. This is the final figure for the paper. I need to get it to Steve early next week. I might send without the current version of this figure.
follow on from https://github.com/pombase/curation/issues/1831
We should add
~defense response we should maybe include this (or a decendent) because it seems to be independent of the immune response ..._ In deed to look into this one further~
Good job !