Open sunta3iouxos opened 3 weeks ago
Thanks for reporting, Shixiang will reply as soon as possible:)
Hi, for simple datasets, you can find the count data in the gdc hub, and transform it into TPM format.
Thank you for this, but it seems that I can not download the counts:
library(UCSCXenaTools)
XE <- XenaGenerate(subset = XenaHostNames == "gdcHub")
XE %>% XenaFilter(filterDatasets = "clinical") -> XE_clinical
XE %>% XenaFilter(filterDatasets = "htseq_counts") -> XE_rna_counts
#download gdc
#download clinical information, this one works
XE_clinical.query <- XenaQuery(XE_clinical)
XE_clinical.download <- XenaDownload(XE_clinical.query,
destdir = "UCSC_Xena/TCGA/counts_Clinical", trans_slash = TRUE, force = TRUE
)
#try to download the counts
XE_rna_counts.query <- XenaQuery(XE_rna_counts)
XE_rna_counts.download <- XenaDownload(XE_rna_counts.query,
destdir = "UCSC_Xena/TCGA/counts_RNAseq", trans_slash = TRUE
)
if (!dir.exists("UCSC_Xena")) {
XE_clinical.query <- XenaQuery(XE_clinical)
XE_clinical.download <- XenaDownload(XE_clinical.query,
destdir = "UCSC_Xena/TCGA/counts_Clinical", trans_slash = TRUE
)
XE_rna_pancan.query <- XenaQuery(XE_rna_pancan)
XE_rna_pancan.download <- XenaDownload(XE_rna_pancan.query,
destdir = "UCSC_Xena/TCGA/counts_RNAseq", trans_slash = TRUE
)
}
downolading of all gdc counts fails:
Downloading TCGA-LAML.htseq_counts.tsv.gz
trying URL 'https://gdc.xenahubs.net/download/TCGA-LAML.htseq_counts.tsv.gz'
==> Trying #2
trying URL 'https://gdc.xenahubs.net/download/TCGA-LAML.htseq_counts.tsv.gz'
==> Trying #3
trying URL 'https://gdc.xenahubs.net/download/TCGA-LAML.htseq_counts.tsv.gz'
Tried 3 times but failed, please check your internet connection!
this is what the quesrry looks like:
> head(XE_rna_pancan.download)
hosts datasets
1 https://gdc.xenahubs.net TCGA-BLCA.htseq_counts.tsv
2 https://gdc.xenahubs.net TCGA-LUSC.htseq_counts.tsv
3 https://gdc.xenahubs.net TCGA-ESCA.htseq_counts.tsv
4 https://gdc.xenahubs.net TARGET-RT.htseq_counts.tsv
5 https://gdc.xenahubs.net MMRF-COMMPASS.htseq_counts.tsv
6 https://gdc.xenahubs.net TCGA-MESO.htseq_counts.tsv
url fileNames
1 https://gdc.xenahubs.net/download/TCGA-BLCA.htseq_counts.tsv.gz TCGA-BLCA.htseq_counts.tsv.gz
2 https://gdc.xenahubs.net/download/TCGA-LUSC.htseq_counts.tsv.gz TCGA-LUSC.htseq_counts.tsv.gz
3 https://gdc.xenahubs.net/download/TCGA-ESCA.htseq_counts.tsv.gz TCGA-ESCA.htseq_counts.tsv.gz
4 https://gdc.xenahubs.net/download/TARGET-RT.htseq_counts.tsv.gz TARGET-RT.htseq_counts.tsv.gz
5 https://gdc.xenahubs.net/download/MMRF-COMMPASS.htseq_counts.tsv.gz MMRF-COMMPASS.htseq_counts.tsv.gz
6 https://gdc.xenahubs.net/download/TCGA-MESO.htseq_counts.tsv.gz TCGA-MESO.htseq_counts.tsv.gz
destfiles
1 UCSC_Xena/TCGA/counts_RNAseq/TCGA-BLCA.htseq_counts.tsv.gz
2 UCSC_Xena/TCGA/counts_RNAseq/TCGA-LUSC.htseq_counts.tsv.gz
3 UCSC_Xena/TCGA/counts_RNAseq/TCGA-ESCA.htseq_counts.tsv.gz
4 UCSC_Xena/TCGA/counts_RNAseq/TARGET-RT.htseq_counts.tsv.gz
5 UCSC_Xena/TCGA/counts_RNAseq/MMRF-COMMPASS.htseq_counts.tsv.gz
6 UCSC_Xena/TCGA/counts_RNAseq/TCGA-MESO.htseq_counts.tsv.gz
How can I get using the XENA tools those counts?
https://xenabrowser.net/datapages/?dataset=tcga_RSEM_gene_tpm&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443 this is what I am looking for RSEM and log(tpm+1)
Hi @sunta3iouxos , please rerun the code with the latest version from GitHub
remotes::install_github("ropensci/UCSCXenaTools")
Hi @sunta3iouxos , please rerun the code with the latest version from GitHub
remotes::install_github("ropensci/UCSCXenaTools")
And XE <- XenaGenerate(subset = XenaHostNames == "gdcHub")
changed to XE <- XenaGenerate(subset = XenaHostNames == "gdcHubV18")
as UCSC Xena updated the data source.
I will do and report.
This one works. Could you please help with this: "For comparing data within independent cohort (like TCGA-LUAD), we recommend to use the "gene expression RNAseq" dataset. For questions regarding the gene expression of this particular cohort in relation to other types tumors, you can use the pancan normalized version of the "gene expression RNAseq" data. For comparing with data outside TCGA, we recommend using the percentile version if the non-TCGA data is normalized by percentile ranking. For more information, please see our Data FAQ: here." I understand that this is the TCGAs way to normalise the data to avoid batch effects is done by using this EB++ algorithm, but they also stating that if you need to add your own dataset maybe it is better to normalized by percentile ranking. Any clues on how to do this? I have never normalised data using that approach.
Is this approach something related to this: https://www.nature.com/articles/s41598-020-72664-6#Sec2
Check https://www.r-bloggers.com/2024/03/mastering-quantile-normalization-in-r-a-step-by-step-guide/ and see more at https://www.google.com/search?q=percentile+normalization+in+r&sca_esv=5487afd26f79d4e0&sxsrf=ADLYWIL88t2cjXP4xQNDR8JUUzRTbtmP2g%3A1731485684107&source=hp&ei=9F80Z9nEBKrh0-kPja2O0Qc&iflsig=AL9hbdgAAAAAZzRuBHEtAsgdwPxbLON8SrenTMM22rhN&ved=0ahUKEwjZjojp7tiJAxWq8DQHHY2WI3oQ4dUDCBY&uact=5&oq=percentile+normalization+in+r&gs_lp=Egdnd3Mtd2l6Ih1wZXJjZW50aWxlIG5vcm1hbGl6YXRpb24gaW4gcjIFECEYoAFI4TdQAFilNnAAeACQAQCYAeABoAH-KKoBBjAuMjYuNbgBA8gBAPgBAvgBAZgCF6ACuh_CAgUQABiABMICCBAAGIAEGMsBwgIEEAAYHsICCBAAGAUYChgewgIGEAAYBRgewgIGEAAYCBgewgIIEAAYgAQYogSYAwCSBwYwLjE4LjWgB_F9&sclient=gws-wiz
Thank you for this tool. I am a novice in all TCGA data, but I am looking to do some analysis, and I wanted to download TPM normalised values, so that I can compine my own RNA-seq data. I think for my need, want to do GSVA, the TPM should be more appropriate than the percentile ranking. From some tutorials I got some values that look more scaled than TPM normalised. I want to use the data for GSVA or singscore Is there a way to accomplish this with the XENAtools? This is the code: (taken from https://github.com/XSLiuLab/tumor-immunogenicity-score)
The author of the code mentions:
The RNASeq data we downloaded are pancan normalized. For comparing data within independent cohort (like TCGA-LUAD), we recommend to use the "gene expression RNAseq" dataset. For questions regarding the gene expression of this particular cohort in relation to other types tumors, you can use the pancan normalized version of the "gene expression RNAseq" data. For comparing with data outside TCGA, we recommend using the percentile version if the non-TCGA data is normalized by percentile ranking. For more information, please see our Data FAQ: [here](https://docs.google.com/document/d/1q-7Tkzd7pci4Rz-_IswASRMRzYrbgx1FTTfAWOyHbmk/edit?usp=sharing
Do you have any recommendations on this? Theodoros