Closed akachru-github closed 1 year ago
Benchmark datasets have been downloaded into collab VM: 142.1.177.110 with folder structure:
Texas/
├── RNA
│ └── TCRBOA7-WGS
│ ├── TCRBOA7-T-RNA.read1.fastq.gz -> ../../data/TCRBOA7-T-RNA.read1.fastq.gz
│ └── TCRBOA7-T-RNA.read2.fastq.gz -> ../../data/TCRBOA7-T-RNA.read2.fastq.gz
├── WES
│ ├── TCRBOA7-WES
│ │ ├── TCRBOA7-T-WEX.read1.fastq.gz -> ../../data/TCRBOA7-T-WEX.read1.fastq.gz
│ │ └── TCRBOA7-T-WEX.read2.fastq.gz -> ../../data/TCRBOA7-T-WEX.read2.fastq.gz
│ └── TCRBOA7-WES_normal
│ ├── TCRBOA7-N-WEX.read1.fastq.gz -> ../../data/TCRBOA7-N-WEX.read1.fastq.gz
│ └── TCRBOA7-N-WEX.read2.fastq.gz -> ../../data/TCRBOA7-N-WEX.read2.fastq.gz
├── WGS
│ ├── TCRBOA7-WGS
│ │ ├── TCRBOA7-T-WGS.lane1.read1.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane1.read1.fastq.gz
│ │ ├── TCRBOA7-T-WGS.lane1.read2.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane1.read2.fastq.gz
│ │ ├── TCRBOA7-T-WGS.lane2.read1.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane2.read1.fastq.gz
│ │ ├── TCRBOA7-T-WGS.lane2.read2.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane2.read2.fastq.gz
│ │ ├── TCRBOA7-T-WGS.lane3.read1.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane3.read1.fastq.gz
│ │ ├── TCRBOA7-T-WGS.lane3.read2.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane3.read2.fastq.gz
│ │ ├── TCRBOA7-T-WGS.lane4.read1.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane4.read1.fastq.gz
│ │ └── TCRBOA7-T-WGS.lane4.read2.fastq.gz -> ../../data/TCRBOA7-T-WGS.lane4.read2.fastq.gz
│ └── TCRBOA7-WGS_normal
│ ├── TCRBOA7-N-WGS.lane1.read1.fastq.gz -> ../../data/TCRBOA7-N-WGS.lane1.read1.fastq.gz
│ ├── TCRBOA7-N-WGS.lane1.read2.fastq.gz -> ../../data/TCRBOA7-N-WGS.lane1.read2.fastq.gz
│ ├── TCRBOA7-N-WGS.lane2.read1.fastq.gz -> ../../data/TCRBOA7-N-WGS.lane2.read1.fastq.gz
│ └── TCRBOA7-N-WGS.lane2.read2.fastq.gz -> ../../data/TCRBOA7-N-WGS.lane2.read2.fastq.gz
├── data
│ ├── TCRBOA7-N-WEX.read1.fastq.gz
│ ├── TCRBOA7-N-WEX.read2.fastq.gz
│ ├── TCRBOA7-N-WGS.lane1.read1.fastq.gz
│ ├── TCRBOA7-N-WGS.lane1.read2.fastq.gz
│ ├── TCRBOA7-N-WGS.lane2.read1.fastq.gz
│ ├── TCRBOA7-N-WGS.lane2.read2.fastq.gz
│ ├── TCRBOA7-T-RNA.read1.fastq.gz
│ ├── TCRBOA7-T-RNA.read2.fastq.gz
│ ├── TCRBOA7-T-WEX.read1.fastq.gz
│ ├── TCRBOA7-T-WEX.read2.fastq.gz
│ ├── TCRBOA7-T-WGS.lane1.read1.fastq.gz
│ ├── TCRBOA7-T-WGS.lane1.read2.fastq.gz
│ ├── TCRBOA7-T-WGS.lane2.read1.fastq.gz
│ ├── TCRBOA7-T-WGS.lane2.read2.fastq.gz
│ ├── TCRBOA7-T-WGS.lane3.read1.fastq.gz
│ ├── TCRBOA7-T-WGS.lane3.read2.fastq.gz
│ ├── TCRBOA7-T-WGS.lane4.read1.fastq.gz
│ └── TCRBOA7-T-WGS.lane4.read2.fastq.gz
└── options
└── reference
Since upload to object store will be done multiple times, this ticket has been created to create a tool to do this:
WGS raw sequencing_experiment
datasets have been submitted to both S3 bucket and RDPC score.
The alignment jobs are in running.
Sanger WG is the final set remaining. This has failed to run in RDPC QA in the past and it might be due to a lack fo resources. Will retest when RDPC QA becomes available.
Will be retrying WG now that the RDPC QA environment is available.
Sanger WF still didn't work for this benchmark data set, suspicion is QA resource issue
But if benchmark data set not needed right now, can leave alone for now (was generated by Somatic Working Group, but maybe they don't need it right now)
Leave ticket open for now, low priority
Completed on a different ticket
Open access data can be used by the QC WG, Mcgill has downlaoded this data and done some processing and think other folks can use this.
We will stage this data in Object storage in Collab. We want to be able to give WG members access to this data.
Linda already has access to object storage to be able to stage this.
[x] Download data from source
[x] Stage data in Openstack object store and RDPC QA
[x] Make data accessible to QC WG members
[x] Run ARGO alignment through the datasets
[x] Upload the alignment results to both Openstack object store and RDPC QA