icgc-dcc / dcc-portal

Data portal for exploring and accessing data
https://dcc.icgc.org/
Other
15 stars 8 forks source link

Bug - Downloading large dataset from ICGC portal stuck at 2GB #755

Closed lindaxiang closed 6 months ago

lindaxiang commented 2 years ago

There are couple of helpdesk tickets reporting the same issue that large dataset downloading stuck at 2GB. https://extsd.oicr.on.ca/projects/ICGCSD/queues/custom/109/ICGCSD-6020 https://extsd.oicr.on.ca/projects/ICGCSD/queues/custom/109/ICGCSD-6026 https://extsd.oicr.on.ca/projects/ICGCSD/queues/custom/11/ICGCSD-6056

Steps To Reproduce

Steps to reproduce the behaviour:

  1. Go to 'https://dcc.icgc.org/releases/current/Projects/LGG-US'
  2. Click on 'meth_array.LGG-US.tsv.gz'
  3. The downloading starts but will stop when the file size reaches almost 2GB
edsu7 commented 2 years ago

Found two more instances of this behaviour regarding files `https://dcc.icgc.org/api/v1/download?fn=/current/Projects/BOCA-FR/meth_seq.BOCA-FR.tsv.gz' 'https://dcc.icgc.org/api/v1/download?fn=/current/Projects/SKCM-US/meth_array.SKCM-US.tsv.gz'

edsu7 commented 2 years ago

Additional help desk tickets: https://extsd.oicr.on.ca/browse/ICGCSD-6080

akachru-github commented 2 years ago

@joneubank , do you know if this is expected behaviour of the ICGC25k portal? Perhaps there is a cap for downloading data from the browser?

@lindaxiang and @edsu7 , can users get this date using the Score client?

akachru-github commented 2 years ago

Cannot download from Score client, this is a hadoop download.

akachru-github commented 2 years ago

Question: Do we have any files that are successfully downloaded?

akachru-github commented 2 years ago

@edsu7 to give an alternate browser to chrome a try to see if behaviour is consistent.

joneubank commented 2 years ago

All browsers exhibit same behaviour. (Safari, Edge, Firefox)

Also testing other files for any exceptions and none found so far. Failing: https://dcc.icgc.org/api/v1/download?fn=/current/Projects/HNSC-US/meth_array.HNSC-US.tsv.gz https://dcc.icgc.org/api/v1/download?fn=/current/Projects/KIRC-US/meth_array.KIRC-US.tsv.gz https://dcc.icgc.org/api/v1/download?fn=/current/Projects/LGG-US/meth_array.LGG-US.tsv.gz https://dcc.icgc.org/api/v1/download?fn=/current/Projects/LUAD-US/meth_array.LUAD-US.tsv.gz https://dcc.icgc.org/api/v1/download?fn=/current/Projects/MALY-DE/meth_seq.MALY-DE.tsv.gz

edsu7 commented 2 years ago

Found 30 files >= 2GB. All failed to get past the 2GB download point (some stopped even earlier ~1.5-1.1).

/PCAWG/networks/string_cosmic_neighbourhood_min900_shell3_20160527.tsv.xz       3.96GB
/PCAWG/reference_data/pcawg-gatk-cocleaning/genome.fa   3.19GB
/PCAWG/reference_data/pcawg-dkfz/dkfz-workflow-dependencies_150318_0951.tar.gz  23.04GB
/PCAWG/reference_data/data_for_testing/testdata.tar.gz  4.54GB
/PCAWG/reference_data/data_for_testing/HCC1143_ds/HCC1143_BL.bam        4.43GB
/PCAWG/reference_data/data_for_testing/HCC1143_ds/HCC1143.bam   5.24GB
/PCAWG/reference_data/pcawg-sanger/GRCh37d5_CGP_refBundle.tar.gz        4.62GB
/PCAWG/reference_data/pcawg-broad/pcawg_broad_public_refs_full.tar.gz   7.78GB
/PCAWG/reference_data/pcawg-bwa-mem/genome.fa.gz.64.bwt 3.14GB
/PCAWG/drivers/p-values/p-values.zip    2.96GB
/PCAWG/consensus_snv_indel/wig_files/coverage_wig_files.tar     45.64GB
/PCAWG/cell_lines/HCC1143_BL/b50ceff5cf365343616d724324dcb445.bam       163.91GB
/PCAWG/cell_lines/HCC1143/257235f3926b2be84e8a9e80acdfb345.bam  141.61GB
/PCAWG/cell_lines/HCC1954/GL15512.HCC1954.1.bam.bai     279.15GB
/PCAWG/cell_lines/HCC1954/GL15512.HCC1954.1.bam 279.15GB
/PCAWG/cell_lines/HCC1954_BL/G15512.HCC1954_BL.1.bam    338.5GB
/current/Projects/BOCA-FR/meth_seq.BOCA-FR.tsv.gz       3.05GB
/current/Projects/BRCA-US/meth_array.BRCA-US.tsv.gz     4.65GB
/current/Projects/HNSC-US/meth_array.HNSC-US.tsv.gz     3.04GB
/current/Projects/KIRC-US/meth_array.KIRC-US.tsv.gz     2.76GB
/current/Projects/LGG-US/meth_array.LGG-US.tsv.gz       2.51GB
/current/Projects/LUAD-US/meth_array.LUAD-US.tsv.gz     2.57GB
/current/Projects/MALY-DE/meth_seq.MALY-DE.tsv.gz       6.96GB
/current/Projects/PBCA-DE/meth_seq.PBCA-DE.tsv.gz       11.61GB
/current/Projects/PRAD-CA/meth_array.PRAD-CA.tsv.gz     3.49GB
/current/Projects/PRAD-US/meth_array.PRAD-US.tsv.gz     2.29GB
/current/Projects/SKCM-US/meth_array.SKCM-US.tsv.gz     2.42GB
/current/Projects/STAD-US/meth_array.STAD-US.tsv.gz     2.24GB
/current/Projects/THCA-US/meth_array.THCA-US.tsv.gz     3.15GB
/current/Projects/UCEC-US/meth_array.UCEC-US.tsv.gz     2.49GB
b-f-chan commented 2 years ago

Let's hold on closing this, and just make sure the other 3 parties respond firsrt

justincorrigible commented 6 months ago

Closing as Done