icgc-argo / roadmap

Place to review/request new features and new tools on ICGC-ARGO's roadmap
1 stars 0 forks source link

Complete Data Release 2 #575

Closed rosibaj closed 3 years ago

rosibaj commented 4 years ago
blabadi commented 4 years ago

blocked by :

https://github.com/overture-stack/arranger/issues/615 and https://github.com/icgc-argo/roadmap/issues/558

rosibaj commented 3 years ago

The index looks good, but we have had a request to remove some documents for release:

Exclude No. 1 - 404 No Data variant timing metrics :

query ($filters: JSON){
  file {
    aggregations (filters:$filters){
      data_type{
        buckets{
          key
          doc_count
        }
      }
    }
  }
}
{
"filters":{"content":[{"content":{"field":"data_category","value":"__missing__"},"op":"in"}],"op":"and"}
}

Exclude No. 2 - Mutect2 Variant Calling Data that is being used for testing:

query ($filters: JSON){
  file {
    aggregations (filters:$filters){
      data_type{
        buckets{
          key
          doc_count
        }
      }
    }
  }
}
{
    "filters": {
        "content": [{
            "content": {
                "field": "analysis.workflow.workflow_name",
                "value": ["GATK Mutect2 Variant Calling"]
            },
            "op": "in"
        }],
        "op": "and"
    }
}

Exclude No. 3 - PACA-CA WXS Donors with only reads, not variant calls

query ($filters: JSON){
  file{
    aggregations (filters:$filters){
        data_type{
        buckets{
          key
          doc_count
        }
      }
    }
  }
}
{
    "filters": {
    "content": [{
        "content": {
            "field": "analysis.experiment.experimental_strategy",
            "value": "WXS"
        },
        "op": "in"
    }, {
        "content": {
            "field": "donors.donor_id",
            "value": ["DO35442","DO35098", "DO35330"
            ]
        },
        "op": "in"
    }],
    "op": "and"
}
}

Exclude No. 4 - 2 PTC-SA Donors with only bad reads that had issues when variant calling was run

query ($filters: JSON){
  file{
    aggregations (filters:$filters){
        data_type{
        buckets{
          key
          doc_count
        }
      }
    }
  }
}
{
    "filters": {"content":[{"content":{"field":"donors.donor_id","value":["DO231311","DO231493"]},"op":"in"}],"op":"and"}
}
rosibaj commented 3 years ago

Closing as this is done! https://docs.icgc-argo.org/docs/release-notes/data-releases#data-release-20