opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

ETL Input - Target step, Gene Ontology 'eco' dataset bug report to GO #3092

Open mbdebian opened 1 year ago

mbdebian commented 1 year ago

This dataset is collected by PIS according to this configuration block

- uri: https://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goa_human.gpa.gz  
  output_filename: goa_human_eco.gpa.gz  
  path: target-inputs/go

Used by the ETL Target step as input according to this configuration

gene-ontology-eco {
    format = "csv"
    path = ${common.input}"/target-inputs/go/goa_human_eco.gpa.gz"
    options = [
      {k: "sep", v: "\\t"}
      {k: "comment", v: "!"}
    ]
  }

Bug to report back to GO The heading of the file reports GPA format version 1.1, but the given metadata doesn't match the specification on the format definition, e.g. about which columns are required.

See content sample below

!gpa-version: 1.1
!
!The set of protein accessions included in this file is based on UniProt reference proteomes, which provide one protein per gene.
!They include the protein sequences annotated in Swiss-Prot or the longest TrEMBL transcript if there is no Swiss-Prot record.
!If a particular protein accession is not annotated with GO, then it will not appear in this file.
!
!Note that the annotation set in this file is filtered in order to reduce redundancy; the full, unfiltered set can be found in
!ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gpa.gz
!
!Columns:
!
!   name                         required? cardinality             GAF column #
!   DB                           yes       1                         1
!   DB_Object_ID                 yes       1                         2 / 17
!   Qualifier                    no        0 or greater              4
!   GO ID                        yes       1                         5
!   DB:Reference                 yes       1 or greater              6
!   ECO evidence code            yes       1                         7 + 6 (GO evidence code + reference)
!   With/From                    no        0 or greater              8
!   Interacting taxon ID         no        0 or 1                   13
!   Date                         yes       1                        14
!   Assigned_by                  yes       1                        15
!   Annotation Extension         no        0 or greater             16
!   Annotation Properties        no        0 or 1                   n/a
!
!Generated: 2023-03-15 09:23
!GO-version: http://purl.obolibrary.org/obo/go/releases/2023-03-12/extensions/go-plus.owl
!
UniProtKB       A0A024RBG1      enables GO:0000298      PMID:21873635   ECO:0000318     PANTHER:PTN000290327|SGD:S000005689             20170228        GO_Central              go_evidence=IBA
UniProtKB       A0A024RBG1      enables GO:0003723      GO_REF:0000043  ECO:0007322     UniProtKB-KW:KW-0694            20230306        UniProt         go_evidence=IEA
UniProtKB       A0A024RBG1      enables GO:0008486      PMID:21873635   ECO:0000318     FB:FBgn0036111|MGI:MGI:1930957|MGI:MGI:2147931|PANTHER:PTN000290327|SGD:S000005689|UniProtKB:O95989|UniProtKB:Q9NZJ9            20230110          GO_Central              go_evidence=IBA
UniProtKB       A0A024RBG1      enables GO:0034431      PMID:21873635   ECO:0000318     FB:FBgn0036111|PANTHER:PTN000290327|PomBase:SPAC13G6.14|SGD:S000005689          20210530        GO_Central              go_evidence=IBA
UniProtKB       A0A024RBG1      enables GO:0034432      PMID:21873635   ECO:0000318     PANTHER:PTN000290327|PomBase:SPAC13G6.14|SGD:S000005689         20170228        GO_Central              go_evidence=IBA
UniProtKB       A0A024RBG1      enables GO:0046872      GO_REF:0000043  ECO:0007322     UniProtKB-KW:KW-0479            20230306        UniProt         go_evidence=IEA
UniProtKB       A0A024RBG1      involved_in     GO:0071543      PMID:21873635   ECO:0000318     PANTHER:PTN000290327|UniProtKB:O95989           20230110        GO_Central              go_evidence=IBA
UniProtKB       A0A024RBG1      involved_in     GO:1901907      PMID:21873635   ECO:0000318     PANTHER:PTN000290327|PomBase:SPAC13G6.14                20170228        GO_Central              go_evidence=IBA
UniProtKB       A0A024RBG1      involved_in     GO:1901909      PMID:21873635   ECO:0000318     PANTHER:PTN000290327|PomBase:SPAC13G6.14                20170228        GO_Central              go_evidence=IBA
UniProtKB       A0A024RBG1      involved_in     GO:1901911      PMID:21873635   ECO:0000318     PANTHER:PTN000290327|PomBase:SPAC13G6.14                20170228        GO_Central              go_evidence=IBA

This issue is a reminder on reporting this back to GO.

prashantuniyal02 commented 1 month ago

We need to contact GPA to resolve this.