icgc-argo / workflow-roadmap

Roadmap and management for genomic data processing
GNU Affero General Public License v3.0
1 stars 0 forks source link

Create benchmark dataset in object storage for QC working group #208

Closed akachru-github closed 1 year ago

akachru-github commented 3 years ago
lindaxiang commented 3 years ago

Benchmark datasets have been downloaded into collab VM: 142.1.177.110 with folder structure:

Texas/
├── RNA
│   └── TCRBOA7-WGS
│       ├── TCRBOA7-T-RNA.read1.fastq.gz -> ../../data/TCRBOA7-T-RNA.read1.fastq.gz
│       └── TCRBOA7-T-RNA.read2.fastq.gz -> ../../data/TCRBOA7-T-RNA.read2.fastq.gz
├── WES
│   ├── TCRBOA7-WES
│   │   ├── TCRBOA7-T-WEX.read1.fastq.gz -> ../../data/TCRBOA7-T-WEX.read1.fastq.gz
│   │   └── TCRBOA7-T-WEX.read2.fastq.gz -> ../../data/TCRBOA7-T-WEX.read2.fastq.gz
│   └── TCRBOA7-WES_normal
│       ├── TCRBOA7-N-WEX.read1.fastq.gz -> ../../data/TCRBOA7-N-WEX.read1.fastq.gz
│       └── TCRBOA7-N-WEX.read2.fastq.gz -> ../../data/TCRBOA7-N-WEX.read2.fastq.gz
├── WGS
│   ├── TCRBOA7-WGS
│   │   ├── TCRBOA7-T-WGS.lane1.read1.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane1.read1.fastq.gz
│   │   ├── TCRBOA7-T-WGS.lane1.read2.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane1.read2.fastq.gz
│   │   ├── TCRBOA7-T-WGS.lane2.read1.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane2.read1.fastq.gz
│   │   ├── TCRBOA7-T-WGS.lane2.read2.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane2.read2.fastq.gz
│   │   ├── TCRBOA7-T-WGS.lane3.read1.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane3.read1.fastq.gz
│   │   ├── TCRBOA7-T-WGS.lane3.read2.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane3.read2.fastq.gz
│   │   ├── TCRBOA7-T-WGS.lane4.read1.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane4.read1.fastq.gz
│   │   └── TCRBOA7-T-WGS.lane4.read2.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane4.read2.fastq.gz
│   └── TCRBOA7-WGS_normal
│       ├── TCRBOA7-N-WGS.lane1.read1.fastq.gz -> ../../data/TCRBOA7-N-WGS.lane1.read1.fastq.gz
│       ├── TCRBOA7-N-WGS.lane1.read2.fastq.gz -> ../../data/TCRBOA7-N-WGS.lane1.read2.fastq.gz
│       ├── TCRBOA7-N-WGS.lane2.read1.fastq.gz -> ../../data/TCRBOA7-N-WGS.lane2.read1.fastq.gz
│       └── TCRBOA7-N-WGS.lane2.read2.fastq.gz -> ../../data/TCRBOA7-N-WGS.lane2.read2.fastq.gz
├── data
│   ├── TCRBOA7-N-WEX.read1.fastq.gz
│   ├── TCRBOA7-N-WEX.read2.fastq.gz
│   ├── TCRBOA7-N-WGS.lane1.read1.fastq.gz
│   ├── TCRBOA7-N-WGS.lane1.read2.fastq.gz
│   ├── TCRBOA7-N-WGS.lane2.read1.fastq.gz
│   ├── TCRBOA7-N-WGS.lane2.read2.fastq.gz
│   ├── TCRBOA7-T-RNA.read1.fastq.gz
│   ├── TCRBOA7-T-RNA.read2.fastq.gz
│   ├── TCRBOA7-T-WEX.read1.fastq.gz
│   ├── TCRBOA7-T-WEX.read2.fastq.gz
│   ├── TCRBOA7-T-WGS.lane1.read1.fastq.gz
│   ├── TCRBOA7-T-WGS.lane1.read2.fastq.gz
│   ├── TCRBOA7-T-WGS.lane2.read1.fastq.gz
│   ├── TCRBOA7-T-WGS.lane2.read2.fastq.gz
│   ├── TCRBOA7-T-WGS.lane3.read1.fastq.gz
│   ├── TCRBOA7-T-WGS.lane3.read2.fastq.gz
│   ├── TCRBOA7-T-WGS.lane4.read1.fastq.gz
│   └── TCRBOA7-T-WGS.lane4.read2.fastq.gz
└── options
    └── reference
akachru-github commented 3 years ago

Since upload to object store will be done multiple times, this ticket has been created to create a tool to do this:

lindaxiang commented 3 years ago

WGS raw sequencing_experiment datasets have been submitted to both S3 bucket and RDPC score.

The alignment jobs are in running.

akachru-github commented 2 years ago

Sanger WG is the final set remaining. This has failed to run in RDPC QA in the past and it might be due to a lack fo resources. Will retest when RDPC QA becomes available.

akachru-github commented 2 years ago

Will be retrying WG now that the RDPC QA environment is available.

b-f-chan commented 2 years ago

Sanger WF still didn't work for this benchmark data set, suspicion is QA resource issue

But if benchmark data set not needed right now, can leave alone for now (was generated by Somatic Working Group, but maybe they don't need it right now)

Leave ticket open for now, low priority

puneet-oicr commented 1 year ago

Completed on a different ticket