Open danhtruong opened 3 years ago
I changed the urls for the data and metadata to point to the correct url.
target_huex_se = function() { genedat = "https://target-data.nci.nih.gov/Public/OS/gene_expression_array/L3/gene_core_rma_summary_annot.txt" sdrf = "https://target-data.nci.nih.gov/Public/OS/gene_expression_array/METADATA/TARGET_OS_GeneExpressionArray_20160812.sdrf.txt" dat = readr::read_tsv(genedat) dat2 = suppressWarnings(readr::read_tsv(sdrf)) sample_map = as.vector(target_usi_to_samplename(dat2[[1]])) names(sample_map) = dat2$`Array Data File` # Make the assay matrix assay_mat = as.matrix(dat[,-c(1:2)]) rownames(assay_mat) = dat[[1]] colnames(assay_mat) = unname(sample_map[match(colnames(assay_mat),names(sample_map))]) # split transcripts, symbols, and pick most common symbol genes = str_split(dat[[2]],' // ') tx_list = lapply(genes,function(g) { if(length(g)<2) return(NA) return(g[seq(1,length(g),2)]) }) symbol_list = lapply(genes,function(g) { if(length(g)<2) return(NA) return(unique(g[seq(2,length(g),2)])) }) symbol = unlist(lapply(genes,function(g) { if(length(g)<2) return(NA) tb = sort(table(g[seq(2,length(g),2)]),decreasing = TRUE) return(unlist(names(tb)[1])) })) # clinical/coldata cdata = target_load_clinical() cdata = as.data.frame(cdata) cdata[[1]] = target_usi_to_samplename(cdata[[1]]) rownames(cdata) = make.unique(cdata[[1]]) cdata = cdata[colnames(assay_mat),] # construct rowdata rowdata = DataFrame(symbol = symbol, tx_list = SimpleList(tx_list), symbol_list = SimpleList(symbol_list), row.names = dat[[1]]) return(SummarizedExperiment(assays = list(exprs = assay_mat), rowData = rowdata, colData = cdata)) }
The result is here. I didn't check other data loading functions.
target_os <- target_huex_se() Rows: 22011 Columns: 91 ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "\t" chr (1): gene_assignment_final dbl (90): probeset_id, AE248-HuEx-1_0-st-v2-01-1_(PATKSS-01A-01R).CEL, AE249-HuEx-1_0-st-v2-01-1_(PAUTWB-01A-01R).C... ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. New names: * `Material Type` -> `Material Type...3` * `Term Source REF` -> `Term Source REF...4` * `Term Source REF` -> `Term Source REF...6` * `Term Source REF` -> `Term Source REF...8` * `Material Type` -> `Material Type...11` * ... Rows: 180 Columns: 44 ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────── Delimiter: "\t" chr (38): Source Name, Provider, Material Type...3, Term Source REF...4, Characteristics[Organism], Term Source RE... dbl (4): Comment[Scanning Station No], Comment[OCG Data Level]...37, Comment[OCG Data Level]...40, Comment[OCG Da... lgl (1): Comment[Array Lot No] date (1): Date ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. Warning in length.out : closing unused connection 5 (ftp://caftpd.nci.nih.gov/pub/OCG-DCC/TARGET/OS/gene_expression_array/L3/gene_core_rma_summary_annot.txt)
Thx, @danhtruong. Looks like I need to do some cleanup. I'm really glad to see that someone is using this!
I changed the urls for the data and metadata to point to the correct url.
The result is here. I didn't check other data loading functions.