ropensci / UCSCXenaTools

:package: An R package for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq https://cran.r-project.org/web/packages/UCSCXenaTools/
https://docs.ropensci.org/UCSCXenaTools
GNU General Public License v3.0
104 stars 12 forks source link

issue: couldn't download pancanAtlas data #35

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hi authors, I tried to download pancancerAtlas dataset thru UCSCXEnaTools, but failed. Code is pasted below and I have tried paste the url shown in the code result, it doesn't give me proper data. Could you help me with it? Thank you!

> pcA_cohort = XenaData %>% 
+     filter(XenaHostNames == "pancanAtlasHub") # select pancanAtlas Hub
> cli_query = pcA_cohort %>% 
+     filter(DataSubtype == "gene expression RNAseq") %>%  # select RNAseq data
+     XenaGenerate() %>%  # generate a XenaHub object
+     XenaQuery() %>% 
+     XenaDownload()
This will check url status, please be patient.
All downloaded files will under directory /var/folders/k2/zhwq4hld003_vbl84g1qvxcr0000gn/T//RtmpAjrRSW.
The 'trans_slash' option is FALSE, keep same directory structure as Xena.
Creating directories for datasets...
Downloading EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz'
==> Trying #2
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz'
==> Trying #3
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz'
Can not find fileEB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz, this file maybe not compressed.
Try downloading fileEB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena...
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena'
==> Trying #2
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena'
==> Trying #3
trying URL 'https://pancanatlas.xenahubs.net/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena'
Your network is bad (try again) or the data source is invalid (report to the developer).
Warning messages:
1: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz': HTTP status was '403 Forbidden'
2: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz': HTTP status was '403 Forbidden'
3: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz': HTTP status was '403 Forbidden'
4: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena': HTTP status was '403 Forbidden'
5: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena': HTTP status was '403 Forbidden'
6: In download.file(url, destfile, ...) :
  cannot open URL 'https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com:443/download/EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena': HTTP status was '403 Forbidden'
github-actions[bot] commented 3 years ago

Thanks for reporting, Shixiang will reply as soon as possible:)

ShixiangWang commented 3 years ago

It seems that the UCSC Xena changed some URLs, I need to update metadata. Please click https://tcga-pancan-atlas-hub.s3.us-east-1.amazonaws.com/download/EB%2B%2BAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz to download.

ShixiangWang commented 3 years ago

@Morphy123 Hi, I fixed this bug. Could you install the latest version from GitHub and try again?

ShixiangWang commented 3 years ago
# install.packages("remotes")
remotes::install_github("ropensci/UCSCXenaTools")
ghost commented 3 years ago

I still get the same error msg even though I updated the package.

On Tue, May 25, 2021 at 9:50 PM Shixiang Wang @.***> wrote:

install.packages("remotes")remotes::install_github("ropensci/UCSCXenaTools")

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/UCSCXenaTools/issues/35#issuecomment-848395661, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANQRUYEV26AU23AMGNGJ5ULTPRHXZANCNFSM45QNB56A .

ShixiangWang commented 3 years ago

Could you restart your R? It looks fine for me.

library(UCSCXenaTools)
library(dplyr)

pcA_cohort = XenaData %>% 
    filter(XenaHostNames == "pancanAtlasHub") # select pancanAtlas Hub
cli_query = pcA_cohort %>% 
    filter(DataSubtype == "gene expression RNAseq") %>%  # select RNAseq data
    XenaGenerate() %>%  # generate a XenaHub object
    XenaQuery() %>% 
    XenaDownload()

See the output:

> library(UCSCXenaTools)
> library(dplyr)
> pcA_cohort = XenaData %>% 
+     filter(XenaHostNames == "pancanAtlasHub") # select pancanAtlas Hub
> cli_query = pcA_cohort %>% 
+     filter(DataSubtype == "gene expression RNAseq") %>%  # select RNAseq data
+     XenaGenerate() %>%  # generate a XenaHub object
+     XenaQuery() %>% 
+     XenaDownload()
This will check url status, please be patient.
All downloaded files will under directory /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpCHIe56.
The 'trans_slash' option is FALSE, keep same directory structure as Xena.
Creating directories for datasets...
Downloading EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz
trying URL 'https://pancanatlas.xenahubs.net/download/EB%2B%2BAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz'
Content type 'binary/octet-stream' length 331000731 bytes (315.7 MB)
==================================================
downloaded 315.7 MB
ghost commented 3 years ago

It works! Thx!

On Tue, May 25, 2021 at 10:10 PM Shixiang Wang @.***> wrote:

Could you restart your R? It looks fine for me.

library(UCSCXenaTools) library(dplyr) pcA_cohort = XenaData %>% filter(XenaHostNames == "pancanAtlasHub") # select pancanAtlas Hubcli_query = pcA_cohort %>% filter(DataSubtype == "gene expression RNAseq") %>% # select RNAseq data XenaGenerate() %>% # generate a XenaHub object XenaQuery() %>% XenaDownload()

See the output:

library(UCSCXenaTools)> library(dplyr)> pcA_cohort = XenaData %>% + filter(XenaHostNames == "pancanAtlasHub") # select pancanAtlas Hub> cli_query = pcA_cohort %>% + filter(DataSubtype == "gene expression RNAseq") %>% # select RNAseq data+ XenaGenerate() %>% # generate a XenaHub object+ XenaQuery() %>% + XenaDownload()This will check url status, please be patient.All downloaded files will under directory /var/folders/bj/nw1w4g1j37ddpgb6zmh3sfh80000gn/T//RtmpCHIe56.The 'trans_slash' option is FALSE, keep same directory structure as Xena.Creating directories for datasets...Downloading EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gztrying URL 'https://pancanatlas.xenahubs.net/download/EB%2B%2BAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena.gz'Content type 'binary/octet-stream' length 331000731 bytes (315.7 MB)==================================================downloaded 315.7 MB

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/UCSCXenaTools/issues/35#issuecomment-848402144, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANQRUYH3ESRUGEG3YLLBZ53TPRKA5ANCNFSM45QNB56A .