Closed jebard closed 4 years ago
Thanks for the report @jebard - it's been a bit busy prepping for BioC2020 but we'll get to this as soon as possible. As a workaround in the meantime, you might try the API method, here's some example code. I have to admit I waited around a while for the last line but not long enough to get the result from the API. This is a big study and this call might be trying to ask for too much in one call.
library(cBioPortalData)
cbio <- cBioPortal()
View(genePanels(cbio))
View(geneTable(cbio))
(mp <- molecularProfiles(cbio, studyId = "cesc_tcga_pan_can_atlas_2018"))
(sl <- sampleLists(cbio, studyId = "cesc_tcga_pan_can_atlas_2018"))
samples <- samplesInSampleLists(cbio, sampleListIds = sl$sampleListId)
molecularData(
api = cbio,
molecularProfileId = "cesc_tcga_pan_can_atlas_2018_rna_seq_v2_mrna",
sampleIds = samples$cesc_tcga_pan_can_atlas_2018_all,
entrezGeneIds = c(1, 2),
)
clinicalData(cbio, studyId = "cesc_tcga_pan_can_atlas_2018")
res <- cBioPortalData(cbio,
genePanelId = "grail_cfdna_508",
studyId = "cesc_tcga_pan_can_atlas_2018")
Thanks very much for following up. I'll give the code snippet provided a try. I traced the issue a little bit on my end and think its isolated to the way that the data tables are being read in -- specifically this line within the function cBioDataPack()
dat <- as.data.frame(readr::read_tsv(fname, comment = "#"),check.names = FALSE)
Temporarily to get around it I swapped this line out for: dat <- read.table(fname,header = T,fill = T)
And that seems to return reasonable results, though its hard to validate if it actually processes all of the tables appropriately when I do this. I have a feeling its some sort of oddity with that particular study.
Thanks!
Hi Jonathan, @jebard
I've had a look into this and made a change in the underlying code. https://github.com/waldronlab/cBioPortalData/commit/80fc587158b9ff52096f122fd41ae634132ef5e7
I've made a couple of changes:
read.delim(sep = "\t")
The source of the issue was that readr::read_tsv
converts the chromosome X values into NA
library(cBioPortalData)
cesc_pan_2018 <- cBioDataPack("cesc_tcga_pan_can_atlas_2018")
tarloc <- downloadStudy("cesc_tcga_pan_can_atlas_2018")
outdir <- file.path(tempdir(), "cesc")
dir.create(outdir)
studyloc <- untarStudy(tarloc, exdir = outdir)
## currently in use
a <- readr::read_delim(file.path(studyloc, "data_mutations_extended.txt"),
comment = "#", delim = "\t")
table(a$Chromosome, useNA="always")
## possible alternative
b <- readr::read_tsv(file.path(studyloc, "data_mutations_extended.txt"),
comment = "#", col_types = cols(.default = col_character()))
## necessary step to convert actual numeric columns to numeric
bb <- type_convert(b)
table(bb$Chromosome, useNA="always")
## proposed change
c <- read.delim(file.path(studyloc, "data_mutations_extended.txt"),
comment.char = "#")
table(c$Chromosome, useNA="always")
system.time({
a <- readr::read_tsv(file.path(studyloc, "data_mutations_extended.txt"),
comment = "#", col_types = cols(.default = col_character()))
type_convert(a)
})
## faster
system.time({
c <- read.delim(file.path(studyloc, "data_mutations_extended.txt"),
comment.char = "#")
})
Perfect! Thanks for taking a look, appreciate it!
Hello! I was wondering if anyone has encountered this problem when loading the study "cesc_tcga_pan_can_atlas_2018".
Is there a known work around?
cesc_pan_2018 <- cBioDataPack("cesc_tcga_pan_can_atlas_2018")
Parsed with column specification: cols( .default = col_character() ) See spec(...) for full column specifications. Error in seqlevels[rankSeqlevels(seqlevels)] <- seqlevels : NAs are not allowed in subscripted assignments.