waldronlab / cBioPortalData

Integrate the cancer genomics portal, cBioPortal, using MultiAssayExperiment
https://waldronlab.io/cBioPortalData/
30 stars 12 forks source link

Data splitting error 'group length is 0 but data length > 0' #73

Closed darencard closed 9 months ago

darencard commented 11 months ago

Hello,

Thanks for this great tool! However, I am having an issue downloading certain datasets using the getDataByGenes function. Here is the full error message I am receiving.

Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0

And here is an example with some comments on what I'm trying to do.

# load libraries
library(cBioPortalData)
library(AnVIL)
library(tidyverse)

# load the data object
(cbio <- cBioPortal())

# gather the different molecular profile options from TCGA Pan Cancer Atlas for lung adenocarcinoma
# I want to use getDataByGenes to load each of these datasets for a given set of genes
datasets <- molecularProfiles(cbio, "luad_tcga_pan_can_atlas_2018")[["molecularProfileId"]]

# loop through datasets and call getDataByGenes on each
for (i in datasets) {
  print(i)

  # run command on each for 10 entrez genes
  try(getDataByGenes(cbio,
                     studyId = "luad_tcga_pan_can_atlas_2018",
                     genes = 1:10,
                     molecularProfileId = i,
                     sampleListId = i)
  )
}

As you can see in the following output, most times, this command fails with the above error message.

[1] "luad_tcga_pan_can_atlas_2018_rppa"
[1] "luad_tcga_pan_can_atlas_2018_rppa_Zscores"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_gistic"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_log2CNA"
[1] "luad_tcga_pan_can_atlas_2018_armlevel_cna"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_mutations"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_structural_variants"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_methylation_hm27_hm450_merge"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna"
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_Zscores"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_Zscores"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_microbiome_signature"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_genetic_ancestry"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0

I first observed this issue with an earlier version of cBioPortalData and it persists with the most recent version as well.

package.version("cBioPortalData")
[1] "2.13.9"

I appreciate any help that can be provided troubleshooting and overcoming this issue.

Thanks in advance! Daren Card

LiNk-NY commented 11 months ago

Hi Daren, @darencard

Thank you for the reproducible example.

It looks like for some molecularProfileIds (e.g., luad_tcga_pan_can_atlas_2018_structural_variants) the endpoint does not have any data. It may be that the data is at a different endpoint location (possibly at https://www.cbioportal.org/api/structuralvariant-genes/)

@inodb Ino, is there an official endpoint to access the structural variants data?

PS. For now, I have enabled a stop_for_status check on the http requests in the package.

Daren, note that you should use the luad_tcga_pan_can_atlas_2018_all sampleListId. See sampleLists(api = cbio, studyId = "luad_tcga_pan_can_atlas_2018").

darencard commented 11 months ago

Hi @LiNk-NY

Thanks so much for the prompt reply and helpful guidance! I wondered if I was setting sampleListId incorrectly.

I have re-run my above example with the luad_tcga_pan_can_atlas_2018_all list, as you recommended, and it is working better now.

# gather the different molecular profile options from TCGA Pan Cancer Atlas for lung adenocarcinoma
# I want to use getDataByGenes to load each of these datasets for a given set of genes
datasets <- molecularProfiles(cbio, "luad_tcga_pan_can_atlas_2018")[["molecularProfileId"]]

# loop through datasets and call getDataByGenes on each
for (i in datasets) {
  print(i)

  # run command on each for 10 entrez genes
  try(getDataByGenes(cbio,
                     studyId = "luad_tcga_pan_can_atlas_2018",
                     genes = 1:10,
                     molecularProfileId = i,
                     sampleListId = "luad_tcga_pan_can_atlas_2018_all")
  )
}

Here is what that looks like.

[1] "luad_tcga_pan_can_atlas_2018_rppa"
[1] "luad_tcga_pan_can_atlas_2018_rppa_Zscores"
[1] "luad_tcga_pan_can_atlas_2018_gistic"
[1] "luad_tcga_pan_can_atlas_2018_log2CNA"
[1] "luad_tcga_pan_can_atlas_2018_armlevel_cna"
[1] "luad_tcga_pan_can_atlas_2018_mutations"
[1] "luad_tcga_pan_can_atlas_2018_structural_variants"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
[1] "luad_tcga_pan_can_atlas_2018_methylation_hm27_hm450_merge"
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna"
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_Zscores"
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_Zscores"
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores"
[1] "luad_tcga_pan_can_atlas_2018_microbiome_signature"
[1] "luad_tcga_pan_can_atlas_2018_genetic_ancestry"

However, I noticed that even though some of the above function calls are not producing errors/warnings, the resulting dataset may still be empty. I'm guessing certain molecular profiles are not available. Here is what I see if I slightly modify my above for loop.

# loop through datasets and call getDataByGenes on each
# save to 'test' and print
for (i in datasets) {
  print(i)

  # run command on each for 10 entrez genes
  try(test <- getDataByGenes(cbio,
                             studyId = "luad_tcga_pan_can_atlas_2018",
                             genes = 1:10,
                             molecularProfileId = i,
                             sampleListId = "luad_tcga_pan_can_atlas_2018_all")
  )

  print(test)
}

And here is the output of that loop.

[1] "luad_tcga_pan_can_atlas_2018_rppa"
named list()
[1] "luad_tcga_pan_can_atlas_2018_rppa_Zscores"
named list()
[1] "luad_tcga_pan_can_atlas_2018_gistic"
$luad_tcga_pan_can_atlas_2018_gistic
# A tibble: 2,555 × 10
   uniqueSampleKey     uniquePatientKey entrezGeneId molecularProfileId sampleId
   <chr>               <chr>                   <int> <chr>              <chr>   
 1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            3 luad_tcga_pan_can… TCGA-05…
 4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
 5 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
 6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            3 luad_tcga_pan_can… TCGA-05…
 9 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
# ℹ 2,545 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <int>,
#   hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows

[1] "luad_tcga_pan_can_atlas_2018_log2CNA"
$luad_tcga_pan_can_atlas_2018_log2CNA
# A tibble: 2,555 × 10
   uniqueSampleKey     uniquePatientKey entrezGeneId molecularProfileId sampleId
   <chr>               <chr>                   <int> <chr>              <chr>   
 1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            3 luad_tcga_pan_can… TCGA-05…
 4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
 5 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
 6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            3 luad_tcga_pan_can… TCGA-05…
 9 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
# ℹ 2,545 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <dbl>,
#   hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows

[1] "luad_tcga_pan_can_atlas_2018_armlevel_cna"
named list()
[1] "luad_tcga_pan_can_atlas_2018_mutations"
$luad_tcga_pan_can_atlas_2018_mutations
# A tibble: 40 × 29
   uniqueSampleKey        uniquePatientKey molecularProfileId sampleId patientId
   <chr>                  <chr>            <chr>              <chr>    <chr>    
 1 VENHQS0wNS00MjUwLTAxO… VENHQS0wNS00MjU… luad_tcga_pan_can… TCGA-05… TCGA-05-…
 2 VENHQS0wNS00MzgyLTAxO… VENHQS0wNS00Mzg… luad_tcga_pan_can… TCGA-05… TCGA-05-…
 3 VENHQS0wNS00NDAyLTAxO… VENHQS0wNS00NDA… luad_tcga_pan_can… TCGA-05… TCGA-05-…
 4 VENHQS0wNS00NDA1LTAxO… VENHQS0wNS00NDA… luad_tcga_pan_can… TCGA-05… TCGA-05-…
 5 VENHQS0wNS00NDI3LTAxO… VENHQS0wNS00NDI… luad_tcga_pan_can… TCGA-05… TCGA-05-…
 6 VENHQS0zOC00NjMxLTAxO… VENHQS0zOC00NjM… luad_tcga_pan_can… TCGA-38… TCGA-38-…
 7 VENHQS0zOC00NjMyLTAxO… VENHQS0zOC00NjM… luad_tcga_pan_can… TCGA-38… TCGA-38-…
 8 VENHQS00NC02Nzc4LTAxO… VENHQS00NC02Nzc… luad_tcga_pan_can… TCGA-44… TCGA-44-…
 9 VENHQS00NC04MTE5LTAxO… VENHQS00NC04MTE… luad_tcga_pan_can… TCGA-44… TCGA-44-…
10 VENHQS00OS1BQVI5LTAxO… VENHQS00OS1BQVI… luad_tcga_pan_can… TCGA-49… TCGA-49-…
# ℹ 30 more rows
# ℹ 24 more variables: entrezGeneId <int>, studyId <chr>, center <chr>,
#   mutationStatus <chr>, validationStatus <chr>, tumorAltCount <int>,
#   tumorRefCount <int>, normalAltCount <int>, normalRefCount <int>,
#   startPosition <int>, endPosition <int>, referenceAllele <chr>,
#   proteinChange <chr>, mutationType <chr>, ncbiBuild <chr>,
#   variantType <chr>, keyword <chr>, chr <chr>, variantAllele <chr>, …
# ℹ Use `print(n = ...)` to see more rows

[1] "luad_tcga_pan_can_atlas_2018_structural_variants"
Error in split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) : 
  group length is 0 but data length > 0
$luad_tcga_pan_can_atlas_2018_mutations
# A tibble: 40 × 29
   uniqueSampleKey        uniquePatientKey molecularProfileId sampleId patientId
   <chr>                  <chr>            <chr>              <chr>    <chr>    
 1 VENHQS0wNS00MjUwLTAxO… VENHQS0wNS00MjU… luad_tcga_pan_can… TCGA-05… TCGA-05-…
 2 VENHQS0wNS00MzgyLTAxO… VENHQS0wNS00Mzg… luad_tcga_pan_can… TCGA-05… TCGA-05-…
 3 VENHQS0wNS00NDAyLTAxO… VENHQS0wNS00NDA… luad_tcga_pan_can… TCGA-05… TCGA-05-…
 4 VENHQS0wNS00NDA1LTAxO… VENHQS0wNS00NDA… luad_tcga_pan_can… TCGA-05… TCGA-05-…
 5 VENHQS0wNS00NDI3LTAxO… VENHQS0wNS00NDI… luad_tcga_pan_can… TCGA-05… TCGA-05-…
 6 VENHQS0zOC00NjMxLTAxO… VENHQS0zOC00NjM… luad_tcga_pan_can… TCGA-38… TCGA-38-…
 7 VENHQS0zOC00NjMyLTAxO… VENHQS0zOC00NjM… luad_tcga_pan_can… TCGA-38… TCGA-38-…
 8 VENHQS00NC02Nzc4LTAxO… VENHQS00NC02Nzc… luad_tcga_pan_can… TCGA-44… TCGA-44-…
 9 VENHQS00NC04MTE5LTAxO… VENHQS00NC04MTE… luad_tcga_pan_can… TCGA-44… TCGA-44-…
10 VENHQS00OS1BQVI5LTAxO… VENHQS00OS1BQVI… luad_tcga_pan_can… TCGA-49… TCGA-49-…
# ℹ 30 more rows
# ℹ 24 more variables: entrezGeneId <int>, studyId <chr>, center <chr>,
#   mutationStatus <chr>, validationStatus <chr>, tumorAltCount <int>,
#   tumorRefCount <int>, normalAltCount <int>, normalRefCount <int>,
#   startPosition <int>, endPosition <int>, referenceAllele <chr>,
#   proteinChange <chr>, mutationType <chr>, ncbiBuild <chr>,
#   variantType <chr>, keyword <chr>, chr <chr>, variantAllele <chr>, …
# ℹ Use `print(n = ...)` to see more rows

[1] "luad_tcga_pan_can_atlas_2018_methylation_hm27_hm450_merge"
named list()
[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna"
$luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna
# A tibble: 2,040 × 10
   uniqueSampleKey     uniquePatientKey entrezGeneId molecularProfileId sampleId
   <chr>               <chr>                   <int> <chr>              <chr>   
 1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
 4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
 5 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
 8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
 9 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU…            1 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU…            2 luad_tcga_pan_can… TCGA-05…
# ℹ 2,030 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <dbl>,
#   hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows

[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_Zscores"
$luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_Zscores
# A tibble: 2,040 × 10
   uniqueSampleKey     uniquePatientKey entrezGeneId molecularProfileId sampleId
   <chr>               <chr>                   <int> <chr>              <chr>   
 1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
 4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
 5 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
 8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
 9 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU…            1 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU…            2 luad_tcga_pan_can… TCGA-05…
# ℹ 2,030 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <dbl>,
#   hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows

[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_Zscores"
$luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_Zscores
# A tibble: 2,040 × 10
   uniqueSampleKey     uniquePatientKey entrezGeneId molecularProfileId sampleId
   <chr>               <chr>                   <int> <chr>              <chr>   
 1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
 4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
 5 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
 8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
 9 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU…            1 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU…            2 luad_tcga_pan_can… TCGA-05…
# ℹ 2,030 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <dbl>,
#   hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows

[1] "luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores"
$luad_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_ref_normal_Zscores
# A tibble: 2,040 × 10
   uniqueSampleKey     uniquePatientKey entrezGeneId molecularProfileId sampleId
   <chr>               <chr>                   <int> <chr>              <chr>   
 1 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 2 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 3 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
 4 VENHQS0wNS00MjQ0LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
 5 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            1 luad_tcga_pan_can… TCGA-05…
 6 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            2 luad_tcga_pan_can… TCGA-05…
 7 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…            9 luad_tcga_pan_can… TCGA-05…
 8 VENHQS0wNS00MjQ5LT… VENHQS0wNS00MjQ…           10 luad_tcga_pan_can… TCGA-05…
 9 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU…            1 luad_tcga_pan_can… TCGA-05…
10 VENHQS0wNS00MjUwLT… VENHQS0wNS00MjU…            2 luad_tcga_pan_can… TCGA-05…
# ℹ 2,030 more rows
# ℹ 5 more variables: patientId <chr>, studyId <chr>, value <dbl>,
#   hugoGeneSymbol <chr>, type <chr>
# ℹ Use `print(n = ...)` to see more rows

[1] "luad_tcga_pan_can_atlas_2018_microbiome_signature"
named list()
[1] "luad_tcga_pan_can_atlas_2018_genetic_ancestry"
named list()

The named list() outputs for some of the function calls are empty lists.

Perhaps this is helpful in addressing any issues you may have noticed. Your prior recommendation solved my immediate problem, so we can probably close this issue, but I will leave it open for now.

One more tangential question: I'm hoping to extract data for all protein-coding genes instead of just 10 genes in the example above. Beyond 1000 genes, it seems that I am running out of memory and the function calls are not completing/failing (R crashes and restarts). Do you have any suggestions for retrieving such data on a genome-wide scale instead of a subset of targeted genes?

Thanks again for the help! Daren Card

LiNk-NY commented 9 months ago

Hi Daren, @darencard Sorry for the late reply. It seems that the structural variants data has moved or is not available:

 cBioPortalData(
     api = cbio,
     studyId = "luad_tcga_pan_can_atlas_2018",
     molecularProfileIds = "luad_tcga_pan_can_atlas_2018_structural_variants",
     genes = 1:10,
     by = "entrezGeneId"
 )
# Error in .invoke_fun(api, name, use_cache, ...) : Not Found (HTTP 404).

Please use the cBioDataPack function to get data from all measured genes.

darencard commented 9 months ago

Okay - thanks for the update! I appreciate your suggestion of cBioDataPack, which I will investigate further. I will close this issue, since my needs are now met. Thanks again!