Closed lindaxiang closed 1 year ago
Considering that Azure and AWS may be also retired in mid-2024, that leaves EGA to be the only long term repository. We decided to make a copy for all files of PCAWG ICGC portion (which do not have a EGA copy) to OICR isilon. These will include:
To get these into the PCAWG release folder, we need to copy the files to the OICR Isilon storage directory (Instructions: https://wiki.oicr.on.ca/display/icgcargotech/Copying+Files+to+Isilon).
The following controlled tier files are ready to be transferred from dcc-proxy
to the Portal:
Source files indicated in path on dcc-proxy
column.
Destination: path on portal
column
[x] 14 ESAD-UK RNA-Seq BAMs and their index files
Will need to create a new subfolder (rnaseq_aligned_bams
) under https://dcc.icgc.org/releases/PCAWG
operation | file/folder | path on dcc-proxy | path on portal |
---|---|---|---|
add | README.md | /nfs/hadoop/workspace/pcawg/rnaseq_aligned_bams | https://dcc.icgc.org/releases/PCAWG/rnaseq_aligned_bams |
add | PCAWG.RNA-Seq.icgc.aligned_bam.metadata.txt | /nfs/hadoop/workspace/pcawg/rnaseq_aligned_bams | https://dcc.icgc.org/releases/PCAWG/rnaseq_aligned_bams |
add | PCAWG.RNA-Seq.ESAD-UK.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/rnaseq_aligned_bams | https://dcc.icgc.org/releases/PCAWG/rnaseq_aligned_bams |
broad_calls
) under https://dcc.icgc.org/releases/PCAWGoperation | file/folder | path on dcc-proxy | path on portal |
---|---|---|---|
add | README.md | /nfs/hadoop/workspace/pcawg/broad_calls | https://dcc.icgc.org/releases/PCAWG/broad_calls |
add | PCAWG.WGS.icgc.broad.metadata.txt | /nfs/hadoop/workspace/pcawg/broad_calls | https://dcc.icgc.org/releases/PCAWG/broad_calls |
add | PCAWG_BROAD.germline.indel.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/broad_calls | https://dcc.icgc.org/releases/PCAWG/broad_calls |
add | PCAWG_BROAD.germline.sv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/broad_calls | https://dcc.icgc.org/releases/PCAWG/broad_calls |
add | PCAWG_BROAD.somatic.indel.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/broad_calls | https://dcc.icgc.org/releases/PCAWG/broad_calls |
add | PCAWG_BROAD.somatic.snv_mnv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/broad_calls | https://dcc.icgc.org/releases/PCAWG/broad_calls |
add | PCAWG_BROAD.somatic.sv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/broad_calls | https://dcc.icgc.org/releases/PCAWG/broad_calls |
dkfz_embl_calls
) under https://dcc.icgc.org/releases/PCAWGoperation | file/folder | path on dcc-proxy | path on portal |
---|---|---|---|
add | README.md | /nfs/hadoop/workspace/pcawg/dkfz_embl_calls | https://dcc.icgc.org/releases/PCAWG/dkfz_embl_calls |
add | PCAWG.WGS.icgc.dkfz_embl.metadata.txt | /nfs/hadoop/workspace/pcawg/dkfz_embl_calls | https://dcc.icgc.org/releases/PCAWG/dkfz_embl_calls |
add | PCAWG_DKFZ_EMBL.germline.indel.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/dkfz_embl_calls | https://dcc.icgc.org/releases/PCAWG/dkfz_embl_calls |
add | PCAWG_DKFZ_EMBL.germline.snv_mnv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/dkfz_embl_calls | https://dcc.icgc.org/releases/PCAWG/dkfz_embl_calls |
add | PCAWG_DKFZ_EMBL.germline.sv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/dkfz_embl_calls | https://dcc.icgc.org/releases/PCAWG/dkfz_embl_calls |
add | PCAWG_DKFZ_EMBL.somatic.cnv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/dkfz_embl_calls | https://dcc.icgc.org/releases/PCAWG/dkfz_embl_calls |
add | PCAWG_DKFZ_EMBL.somatic.indel.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/dkfz_embl_calls | https://dcc.icgc.org/releases/PCAWG/dkfz_embl_calls |
add | PCAWG_DKFZ_EMBL.somatic.snv_mnv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/dkfz_embl_calls | https://dcc.icgc.org/releases/PCAWG/dkfz_embl_calls |
add | PCAWG_DKFZ_EMBL.somatic.sv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/dkfz_embl_calls | https://dcc.icgc.org/releases/PCAWG/dkfz_embl_calls |
sanger_calls
) under https://dcc.icgc.org/releases/PCAWGoperation | file/folder | path on dcc-proxy | path on portal |
---|---|---|---|
add | README.md | /nfs/hadoop/workspace/pcawg/sanger_calls | https://dcc.icgc.org/releases/PCAWG/sanger_calls |
add | PCAWG.WGS.icgc.sanger.metadata.txt | /nfs/hadoop/workspace/pcawg/sanger_calls | https://dcc.icgc.org/releases/PCAWG/sanger_calls |
add | PCAWG_SANGER.somatic.cnv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/sanger_calls | https://dcc.icgc.org/releases/PCAWG/sanger_calls |
add | PCAWG_SANGER.somatic.indel.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/sanger_calls | https://dcc.icgc.org/releases/PCAWG/sanger_calls |
add | PCAWG_SANGER.somatic.snv_mnv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/sanger_calls | https://dcc.icgc.org/releases/PCAWG/sanger_calls |
add | PCAWG_SANGER.somatic.sv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/sanger_calls | https://dcc.icgc.org/releases/PCAWG/sanger_calls |
muse_calls
) under https://dcc.icgc.org/releases/PCAWGoperation | file/folder | path on dcc-proxy | path on portal |
---|---|---|---|
add | README.md | /nfs/hadoop/workspace/pcawg/muse_calls | https://dcc.icgc.org/releases/PCAWG/muse_calls |
add | PCAWG.WGS.icgc.muse.metadata.txt | /nfs/hadoop/workspace/pcawg/muse_calls | https://dcc.icgc.org/releases/PCAWG/muse_calls |
add | PCAWG_MUSE.somatic.snv_mnv.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/muse_calls | https://dcc.icgc.org/releases/PCAWG/muse_calls |
pilot50_calls
) under https://dcc.icgc.org/releases/PCAWGoperation | file/folder | path on dcc-proxy | path on portal |
---|---|---|---|
add | README.md | /nfs/hadoop/workspace/pcawg/pilot50_calls | https://dcc.icgc.org/releases/PCAWG/pilot50_calls |
add | PCAWG.Pilot50.icgc.vcf.metadata.txt | /nfs/hadoop/workspace/pcawg/pilot50_calls | https://dcc.icgc.org/releases/PCAWG/pilot50_calls |
add | PCAWG_Pilot50.somatic.mutation.icgc.controlled.tgz | /nfs/hadoop/workspace/pcawg/pilot50_calls | https://dcc.icgc.org/releases/PCAWG/pilot50_calls |
validation_bams
) under https://dcc.icgc.org/releases/PCAWGoperation | file/folder | path on dcc-proxy | path on portal |
---|---|---|---|
add | README.md | /nfs/hadoop/workspace/pcawg/validation_bams | https://dcc.icgc.org/releases/PCAWG/validation_bams |
add | PCAWG.Validation.icgc.aligned_bam.metadata.txt | /nfs/hadoop/workspace/pcawg/validation_bams | https://dcc.icgc.org/releases/PCAWG/validation_bams |
add | PCAWG.Pilot50.validation_bam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/validation_bams | https://dcc.icgc.org/releases/PCAWG/validation_bams |
wgs_aligned_bams
) under https://dcc.icgc.org/releases/PCAWGoperation | file/folder | path on dcc-proxy | path on portal |
---|---|---|---|
add | README.md | /nfs/hadoop/workspace/pcawg/wgs_aligned_bams | https://dcc.icgc.org/releases/PCAWG/wgs_aligned_bams |
add | PCAWG.WGS.icgc.aligned_bam.metadata.txt | /nfs/hadoop/workspace/pcawg/wgs_aligned_bams | https://dcc.icgc.org/releases/PCAWG/wgs_aligned_bams |
minibams
) under https://dcc.icgc.org/releases/PCAWGoperation | file/folder | path on dcc-proxy | path on portal |
---|---|---|---|
add | README.md | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | PCAWG.WGS.icgc.minibam.metadata.txt | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | BOCA-UK.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | BRCA-EU.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | BRCA-UK.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | BTCA-SG.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | CLLE-ES.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | CMDI-UK.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | EOPC-DE.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | ESAD-UK.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | GACA-CN.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | LAML-KR.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | LICA-FR.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | LINC-JP.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | LIRI-JP.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | MALY-DE.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | MELA-AU.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | ORCA-IN.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | OV-AU.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | PACA-AU.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | PACA-CA.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | PAEN-AU.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | PAEN-IT.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | PBCA-DE.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | PRAD-CA.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | PRAD-UK.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
add | RECA-EU.minibam.icgc.controlled.access (folder) | /nfs/hadoop/workspace/pcawg/minibams | https://dcc.icgc.org/releases/PCAWG/minibams |
Hit a blocker that the staging area in dcc-proxy.res.oicr.on.ca:/nfs/hadoop/workspace/pcawg
runs out of space.
Thanks to Jared's kind help. The blocker was removed.
Hit another blocker. This time the quota on the hadoop fs was tripped @ 10TB. I've asked IT to increase this by 2.5TB. WIll resume transfers then.
All data has been copied to hadoop and is available at https://dcc.icgc.org/releases/PCAWG/ . Please validate copy when you can.
Double check the copies files on pcawg release folders. All look good to me. Thanks to Jared! The ticket can be closed.
Closing!
As noted in ticket, we have audited all the ICGC25k data in Collab.
This the summary of ICGC25k-PCAWG Non-US data across various repository: https://docs.google.com/spreadsheets/d/184EVudu9H59RD14zDwQLt2sx1rMHtfvTAXc5sLAv4p8/edit#gid=942384972
Note: The files status only consider the data of PCAWG donors from Whitelist and Graylist in the latest release (May 2016)
With Collaboratory shutting down, we will migrate all Collab-only Non-US PCAWG files to OICR Isilon storage.
PCAWG Non-US files need to be copied are: