Closed darencard closed 9 months ago
Hi Daren, @darencard
Thank you for the reproducible example.
It looks like for some molecularProfileIds
(e.g., luad_tcga_pan_can_atlas_2018_structural_variants
) the endpoint does not have any data. It may be that the data is at a different endpoint location (possibly at https://www.cbioportal.org/api/structuralvariant-genes/)
@inodb Ino, is there an official endpoint to access the structural variants data?
PS. For now, I have enabled a stop_for_status
check on the http
requests in the package.
Daren, note that you should use the luad_tcga_pan_can_atlas_2018_all
sampleListId
. See sampleLists(api = cbio, studyId = "luad_tcga_pan_can_atlas_2018")
.
Hi @LiNk-NY
Thanks so much for the prompt reply and helpful guidance! I wondered if I was setting sampleListId
incorrectly.
I have re-run my above example with the luad_tcga_pan_can_atlas_2018_all
list, as you recommended, and it is working better now.
# gather the different molecular profile options from TCGA Pan Cancer Atlas for lung adenocarcinoma
# I want to use getDataByGenes to load each of these datasets for a given set of genes
datasets <- molecularProfiles(cbio, "luad_tcga_pan_can_atlas_2018")[["molecularProfileId"]]
# loop through datasets and call getDataByGenes on each
for (i in datasets) {
print(i)
# run command on each for 10 entrez genes
try(getDataByGenes(cbio,
studyId = "luad_tcga_pan_can_atlas_2018",
genes = 1:10,
molecularProfileId = i,
sampleListId = "luad_tcga_pan_can_atlas_2018_all")
)
}
Here is what that looks like.
[1] "luad_tcga_pan_can_atlas_2018_rppa"
[1] "luad_tcga_pan_can_atlas_2018_rppa_Zscores"
[1] "luad_tcga_pan_can_atlas_2018_gistic"
[1] "luad_tcga_pan_can_atlas_2018_log2CNA"
[1] "luad_tcga_pan_can_atlas_2018_armlevel_cna"
[1] "luad_tcga_pan_can_atlas_2018_mutations"
[1] "luad_tcga_pan_can_atlas_2018_structural_variants"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_methylation_hm27_hm450_merge"
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna"
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_Zscores"
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_Zscores"
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores"
[1] "luad_tcga_pan_can_atlas_2018_microbiome_signature"
[1] "luad_tcga_pan_can_atlas_2018_genetic_ancestry"
However, I noticed that even though some of the above function calls are not producing errors/warnings, the resulting dataset may still be empty. I'm guessing certain molecular profiles are not available. Here is what I see if I slightly modify my above for loop.
# loop through datasets and call getDataByGenes on each
# save to 'test' and print
for (i in datasets) {
print(i)
# run command on each for 10 entrez genes
try(test <- getDataByGenes(cbio,
studyId = "luad_tcga_pan_can_atlas_2018",
genes = 1:10,
molecularProfileId = i,
sampleListId = "luad_tcga_pan_can_atlas_2018_all")
)
print(test)
}
And here is the output of that loop.
[1] "luad_tcga_pan_can_atlas_2018_rppa"
named list()
[1] "luad_tcga_pan_can_atlas_2018_rppa_Zscores"
named list()
[1] "luad_tcga_pan_can_atlas_2018_gistic"
$luad_tcga_pan_can_atlas_2018_gistic
# A tibble: 2,555 × 10
uniqueSampleKey uniquePatientKey entrezGeneId molecularProfileId sampleId
<chr> <chr> <int> <chr> <chr>
1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 3 luad_tcga_pan_can… TCGA-05…
4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
5 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 3 luad_tcga_pan_can… TCGA-05…
9 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
# ℹ 2,545 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <int>,
# hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows
[1] "luad_tcga_pan_can_atlas_2018_log2CNA"
$luad_tcga_pan_can_atlas_2018_log2CNA
# A tibble: 2,555 × 10
uniqueSampleKey uniquePatientKey entrezGeneId molecularProfileId sampleId
<chr> <chr> <int> <chr> <chr>
1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 3 luad_tcga_pan_can… TCGA-05…
4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
5 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 3 luad_tcga_pan_can… TCGA-05…
9 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
# ℹ 2,545 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <dbl>,
# hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows
[1] "luad_tcga_pan_can_atlas_2018_armlevel_cna"
named list()
[1] "luad_tcga_pan_can_atlas_2018_mutations"
$luad_tcga_pan_can_atlas_2018_mutations
# A tibble: 40 × 29
uniqueSampleKey uniquePatientKey molecularProfileId sampleId patientId
<chr> <chr> <chr> <chr> <chr>
1 VENHQS0wNS00MjUwLTAxO… VENHQS0wNS00MjU… luad_tcga_pan_can… TCGA-05… TCGA-05-…
2 VENHQS0wNS00MzgyLTAxO… VENHQS0wNS00Mzg… luad_tcga_pan_can… TCGA-05… TCGA-05-…
3 VENHQS0wNS00NDAyLTAxO… VENHQS0wNS00NDA… luad_tcga_pan_can… TCGA-05… TCGA-05-…
4 VENHQS0wNS00NDA1LTAxO… VENHQS0wNS00NDA… luad_tcga_pan_can… TCGA-05… TCGA-05-…
5 VENHQS0wNS00NDI3LTAxO… VENHQS0wNS00NDI… luad_tcga_pan_can… TCGA-05… TCGA-05-…
6 VENHQS0zOC00NjMxLTAxO… VENHQS0zOC00NjM… luad_tcga_pan_can… TCGA-38… TCGA-38-…
7 VENHQS0zOC00NjMyLTAxO… VENHQS0zOC00NjM… luad_tcga_pan_can… TCGA-38… TCGA-38-…
8 VENHQS00NC02Nzc4LTAxO… VENHQS00NC02Nzc… luad_tcga_pan_can… TCGA-44… TCGA-44-…
9 VENHQS00NC04MTE5LTAxO… VENHQS00NC04MTE… luad_tcga_pan_can… TCGA-44… TCGA-44-…
10 VENHQS00OS1BQVI5LTAxO… VENHQS00OS1BQVI… luad_tcga_pan_can… TCGA-49… TCGA-49-…
# ℹ 30 more rows
# ℹ 24 more variables: entrezGeneId <int>, studyId <chr>, center <chr>,
# mutationStatus <chr>, validationStatus <chr>, tumorAltCount <int>,
# tumorRefCount <int>, normalAltCount <int>, normalRefCount <int>,
# startPosition <int>, endPosition <int>, referenceAllele <chr>,
# proteinChange <chr>, mutationType <chr>, ncbiBuild <chr>,
# variantType <chr>, keyword <chr>, chr <chr>, variantAllele <chr>, …
# ℹ Use `print(n = ...)` to see more rows
[1] "luad_tcga_pan_can_atlas_2018_structural_variants"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
group length is 0 but data length > 0
$luad_tcga_pan_can_atlas_2018_mutations
# A tibble: 40 × 29
uniqueSampleKey uniquePatientKey molecularProfileId sampleId patientId
<chr> <chr> <chr> <chr> <chr>
1 VENHQS0wNS00MjUwLTAxO… VENHQS0wNS00MjU… luad_tcga_pan_can… TCGA-05… TCGA-05-…
2 VENHQS0wNS00MzgyLTAxO… VENHQS0wNS00Mzg… luad_tcga_pan_can… TCGA-05… TCGA-05-…
3 VENHQS0wNS00NDAyLTAxO… VENHQS0wNS00NDA… luad_tcga_pan_can… TCGA-05… TCGA-05-…
4 VENHQS0wNS00NDA1LTAxO… VENHQS0wNS00NDA… luad_tcga_pan_can… TCGA-05… TCGA-05-…
5 VENHQS0wNS00NDI3LTAxO… VENHQS0wNS00NDI… luad_tcga_pan_can… TCGA-05… TCGA-05-…
6 VENHQS0zOC00NjMxLTAxO… VENHQS0zOC00NjM… luad_tcga_pan_can… TCGA-38… TCGA-38-…
7 VENHQS0zOC00NjMyLTAxO… VENHQS0zOC00NjM… luad_tcga_pan_can… TCGA-38… TCGA-38-…
8 VENHQS00NC02Nzc4LTAxO… VENHQS00NC02Nzc… luad_tcga_pan_can… TCGA-44… TCGA-44-…
9 VENHQS00NC04MTE5LTAxO… VENHQS00NC04MTE… luad_tcga_pan_can… TCGA-44… TCGA-44-…
10 VENHQS00OS1BQVI5LTAxO… VENHQS00OS1BQVI… luad_tcga_pan_can… TCGA-49… TCGA-49-…
# ℹ 30 more rows
# ℹ 24 more variables: entrezGeneId <int>, studyId <chr>, center <chr>,
# mutationStatus <chr>, validationStatus <chr>, tumorAltCount <int>,
# tumorRefCount <int>, normalAltCount <int>, normalRefCount <int>,
# startPosition <int>, endPosition <int>, referenceAllele <chr>,
# proteinChange <chr>, mutationType <chr>, ncbiBuild <chr>,
# variantType <chr>, keyword <chr>, chr <chr>, variantAllele <chr>, …
# ℹ Use `print(n = ...)` to see more rows
[1] "luad_tcga_pan_can_atlas_2018_methylation_hm27_hm450_merge"
named list()
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna"
$luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna
# A tibble: 2,040 × 10
uniqueSampleKey uniquePatientKey entrezGeneId molecularProfileId sampleId
<chr> <chr> <int> <chr> <chr>
1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
5 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
9 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU… 1 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU… 2 luad_tcga_pan_can… TCGA-05…
# ℹ 2,030 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <dbl>,
# hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_Zscores"
$luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_Zscores
# A tibble: 2,040 × 10
uniqueSampleKey uniquePatientKey entrezGeneId molecularProfileId sampleId
<chr> <chr> <int> <chr> <chr>
1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
5 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
9 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU… 1 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU… 2 luad_tcga_pan_can… TCGA-05…
# ℹ 2,030 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <dbl>,
# hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_Zscores"
$luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_Zscores
# A tibble: 2,040 × 10
uniqueSampleKey uniquePatientKey entrezGeneId molecularProfileId sampleId
<chr> <chr> <int> <chr> <chr>
1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
5 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
9 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU… 1 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU… 2 luad_tcga_pan_can… TCGA-05…
# ℹ 2,030 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <dbl>,
# hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores"
$luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores
# A tibble: 2,040 × 10
uniqueSampleKey uniquePatientKey entrezGeneId molecularProfileId sampleId
<chr> <chr> <int> <chr> <chr>
1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
5 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 1 luad_tcga_pan_can… TCGA-05…
6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 2 luad_tcga_pan_can… TCGA-05…
7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 9 luad_tcga_pan_can… TCGA-05…
8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ… 10 luad_tcga_pan_can… TCGA-05…
9 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU… 1 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU… 2 luad_tcga_pan_can… TCGA-05…
# ℹ 2,030 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <dbl>,
# hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows
[1] "luad_tcga_pan_can_atlas_2018_microbiome_signature"
named list()
[1] "luad_tcga_pan_can_atlas_2018_genetic_ancestry"
named list()
The named list()
outputs for some of the function calls are empty lists.
Perhaps this is helpful in addressing any issues you may have noticed. Your prior recommendation solved my immediate problem, so we can probably close this issue, but I will leave it open for now.
One more tangential question: I'm hoping to extract data for all protein-coding genes instead of just 10 genes in the example above. Beyond 1000 genes, it seems that I am running out of memory and the function calls are not completing/failing (R crashes and restarts). Do you have any suggestions for retrieving such data on a genome-wide scale instead of a subset of targeted genes?
Thanks again for the help! Daren Card
Hi Daren, @darencard Sorry for the late reply. It seems that the structural variants data has moved or is not available:
cBioPortalData(
api = cbio,
studyId = "luad_tcga_pan_can_atlas_2018",
molecularProfileIds = "luad_tcga_pan_can_atlas_2018_structural_variants",
genes = 1:10,
by = "entrezGeneId"
)
# Error in .invoke_fun(api, name, use_cache, ...) : Not Found (HTTP 404).
Please use the cBioDataPack
function to get data from all measured genes.
Okay - thanks for the update! I appreciate your suggestion of cBioDataPack
, which I will investigate further. I will close this issue, since my needs are now met. Thanks again!
Hello,
Thanks for this great tool! However, I am having an issue downloading certain datasets using the
getDataByGenes
function. Here is the full error message I am receiving.And here is an example with some comments on what I'm trying to do.
As you can see in the following output, most times, this command fails with the above error message.
I first observed this issue with an earlier version of
cBioPortalData
and it persists with the most recent version as well.I appreciate any help that can be provided troubleshooting and overcoming this issue.
Thanks in advance! Daren Card