Need to follow up fixing the GO ID in the with field (UniProt would like us to use the Intact complex ID)
There are some other things in here which could be added to consistency checks...
These are all fixed except for the GO ID in the with field for some of the IPI mappings.
I will look into converting these to Intact Complex IDs, or creating specific complex binding terms
The others will filter through when we do our next GO update which will probably be in a few weeks
Cheers
val
On 07/03/2011 14:22, Rachael Huntley wrote:
> Hi Val,
>
> Would you be able to help us with some questions on your gene association file, please?
>
> We're currently trying to improve the GO annotation we integrate into UniProtKB from external MODs, by looking at including the MOD identifiers used in the 'with' field of externally-generated annotations; at the moment we ignore any 'with' field data that doesn't use a GO identifier or UniProtKB accession, and so integrate these annotations into our set with an empty 'with' field, which obviously is not ideal. Therefore we would very much like to include S. pombe identifiers that match the following regular expression:
> (GeneDB_Spombe):(SP(\d|\w)+.(\d|\w)+)
>
> Does this look reasonable to you?
>
> In addition, as our database schema does not allow more than one value in the 'with', we are 'unwrapping' lists of identifiers that are separated by a pipe, to generate multiple annotation rows that differ solely by the contents of the 'with' field. We feel that this should be a reasonable way of treating such data for IPI and IMP annotations for as I understand the pipe usage, it should be interpreted as separating indicating two gene products that have been shown to independently (but from data obtained from same paper and type of evidence) interact with the annotation object, to support the annotation of the same GO term. However please let me know if your interpretation is different, as we can't find any GO documentation on correct usage of pipes in the GAF format!
>
> Currently, there are a number of exceptions for the use of the with column generated from your file, which I have listed below together with the reasons why they have been rejected. If you feel that any of these are using a check that is too stringent, please do let us know.
>
> Rejected:
> [IC UniProtKB:O42870]
> Reason: The with column for an IC annotation should be filled with a GO ID
>
> Rejected:
> [IEP GeneDB_Spombe:SPAC25G10.03]
> [IEP GO:0016592]
> [IEP PMID:12161753]
> Reason: The with column should not be filled when using IEP
>
> Rejected:
> [IGI GeneDB_Spombe:S000000807]
> Reason: The identifier used is an SGD identifier, not GeneDB
>
> Rejected:
> [IGI SGD:000003904]
> Reason: The identifier is missing an 'S' at the beginning
>
> Rejected:
> [IGI SGD:S00000268]
> [IGI SGD:S00000550]
> [IGI SGD:S00003295]
> Reason: SGD identifiers should be 'S' followed by 9 digits
>
> Rejected:
> [IMP GO:0004660]
> [IMP GO:0005681]
> [IMP GO:0008990]
> [IMP GO:0046557]
> [IMP GO:0047657]
> Reason: A GO ID should not be used in the with column of an IMP annotation
>
> Rejected:
> [IMP PMID:11679064]
> [IMP PMID:12193640]
> [IMP PMID:14623292]
> [IMP PMID:16738311]
> Reason: A PMID should not be used in the with column of an IMP annotation
>
> In addition, we are trying to be quite strict in only importing annotations that apply a MOD identifier, rather than gene symbols. Therefore would you be willing to convert the following 'with' contents in your GO file into UniProtKB accessions?
>
> [IGI UniProtKB:BIN3_HUMAN]
> [IGI UniProtKB:ECC1_HUMAN]
> [IGI UniProtKB:MK01_HUMAN]
> [IGI UniProtKB:PIGW_HUMAN]
> [IGI UniProtKB:PRS6B_HUMAN]
> [IGI UniProtKB:PYRF_ECOLI]
>
> We've also noticed that you have several IPI annotations that have a GO ID for a complex in the 'with' column. We are planning to use IntAct complex IDs to cover this type of information, so we will continue to reject annotations that have a GOID in the with field for IPI. Supplying an IntAct complex ID in the with field instead of the GO ID would be a more accurate representation of the data, since the GO complex terms are not defined particularly well with regard to the composition of the complex in differing species.
> We have been working with IntAct on making protein complex IDs more visible in QuickGO (e.g. see http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005680\#info=2) and they have been very responsive to our requests. If you feel like you wanted to supply IntAct complex IDs in the with field, I am sure IntAct would be more than willing to create any complex IDs that are missing.
>
> Rejected:
> [IPI GO:0000812]
> [IPI GO:0005680]
> [IPI GO:0005681]
> [IPI GO:0005685]
> [IPI GO:0005832]
> [IPI GO:0005884]
> [IPI GO:0008180]
> [IPI GO:0016575]
> [IPI GO:0016592]
> [IPI GO:0031011]
> [IPI GO:0031011|GO:0000812]
> [IPI GO:0031511]
> [IPI GO:0031533]
> [IPI GO:0032221]
> [IPI GO:0033186]
> [IPI GO:0034967]
> [IPI GO:0035267]
> [IPI GO:0070209]
Need to follow up fixing the GO ID in the with field (UniProt would like us to use the Intact complex ID) There are some other things in here which could be added to consistency checks...
These are all fixed except for the GO ID in the with field for some of the IPI mappings. I will look into converting these to Intact Complex IDs, or creating specific complex binding terms
The others will filter through when we do our next GO update which will probably be in a few weeks
Cheers
val
On 07/03/2011 14:22, Rachael Huntley wrote: > Hi Val, > > Would you be able to help us with some questions on your gene association file, please? > > We're currently trying to improve the GO annotation we integrate into UniProtKB from external MODs, by looking at including the MOD identifiers used in the 'with' field of externally-generated annotations; at the moment we ignore any 'with' field data that doesn't use a GO identifier or UniProtKB accession, and so integrate these annotations into our set with an empty 'with' field, which obviously is not ideal. Therefore we would very much like to include S. pombe identifiers that match the following regular expression: > (GeneDB_Spombe):(SP(\d|\w)+.(\d|\w)+) > > Does this look reasonable to you? > > In addition, as our database schema does not allow more than one value in the 'with', we are 'unwrapping' lists of identifiers that are separated by a pipe, to generate multiple annotation rows that differ solely by the contents of the 'with' field. We feel that this should be a reasonable way of treating such data for IPI and IMP annotations for as I understand the pipe usage, it should be interpreted as separating indicating two gene products that have been shown to independently (but from data obtained from same paper and type of evidence) interact with the annotation object, to support the annotation of the same GO term. However please let me know if your interpretation is different, as we can't find any GO documentation on correct usage of pipes in the GAF format! > > Currently, there are a number of exceptions for the use of the with column generated from your file, which I have listed below together with the reasons why they have been rejected. If you feel that any of these are using a check that is too stringent, please do let us know. > > Rejected: > [IC UniProtKB:O42870] > Reason: The with column for an IC annotation should be filled with a GO ID > > Rejected: > [IEP GeneDB_Spombe:SPAC25G10.03] > [IEP GO:0016592] > [IEP PMID:12161753] > Reason: The with column should not be filled when using IEP > > Rejected: > [IGI GeneDB_Spombe:S000000807] > Reason: The identifier used is an SGD identifier, not GeneDB > > Rejected: > [IGI SGD:000003904] > Reason: The identifier is missing an 'S' at the beginning > > Rejected: > [IGI SGD:S00000268] > [IGI SGD:S00000550] > [IGI SGD:S00003295] > Reason: SGD identifiers should be 'S' followed by 9 digits > > Rejected: > [IMP GO:0004660] > [IMP GO:0005681] > [IMP GO:0008990] > [IMP GO:0046557] > [IMP GO:0047657] > Reason: A GO ID should not be used in the with column of an IMP annotation > > Rejected: > [IMP PMID:11679064] > [IMP PMID:12193640] > [IMP PMID:14623292] > [IMP PMID:16738311] > Reason: A PMID should not be used in the with column of an IMP annotation > > In addition, we are trying to be quite strict in only importing annotations that apply a MOD identifier, rather than gene symbols. Therefore would you be willing to convert the following 'with' contents in your GO file into UniProtKB accessions? > > [IGI UniProtKB:BIN3_HUMAN] > [IGI UniProtKB:ECC1_HUMAN] > [IGI UniProtKB:MK01_HUMAN] > [IGI UniProtKB:PIGW_HUMAN] > [IGI UniProtKB:PRS6B_HUMAN] > [IGI UniProtKB:PYRF_ECOLI] > > We've also noticed that you have several IPI annotations that have a GO ID for a complex in the 'with' column. We are planning to use IntAct complex IDs to cover this type of information, so we will continue to reject annotations that have a GOID in the with field for IPI. Supplying an IntAct complex ID in the with field instead of the GO ID would be a more accurate representation of the data, since the GO complex terms are not defined particularly well with regard to the composition of the complex in differing species. > We have been working with IntAct on making protein complex IDs more visible in QuickGO (e.g. see http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005680\#info=2) and they have been very responsive to our requests. If you feel like you wanted to supply IntAct complex IDs in the with field, I am sure IntAct would be more than willing to create any complex IDs that are missing. > > Rejected: > [IPI GO:0000812] > [IPI GO:0005680] > [IPI GO:0005681] > [IPI GO:0005685] > [IPI GO:0005832] > [IPI GO:0005884] > [IPI GO:0008180] > [IPI GO:0016575] > [IPI GO:0016592] > [IPI GO:0031011] > [IPI GO:0031011|GO:0000812] > [IPI GO:0031511] > [IPI GO:0031533] > [IPI GO:0032221] > [IPI GO:0033186] > [IPI GO:0034967] > [IPI GO:0035267] > [IPI GO:0070209]
Original comment by: ValWood