Closed JinsyWang closed 5 months ago
Solved. Error was found in the rbindlist function while running runFeatureCount.R.
`> info_parsed[3035:3040] [[1]] gene_id gene_name 1: ENSMUSG00000101940 Akp-ps1
[[2]] gene_id 1: ENSMUSG00000121351
[[3]] gene_id gene_name 1: ENSMUSG00000102817 Gm6185
[[4]] gene_id 1: ENSMUSG00000121369
[[5]] gene_id gene_name 1: ENSMUSG00000096964 Gm6345
[[6]] gene_id gene_name 1: ENSMUSG00000121407 Gm6345
info_parsed <- rbindlist(info_parsed) Error in rbindlist(info_parsed) : Item 3036 has 1 columns, inconsistent with item 1 which has 2 columns. To fill missing columns use fill=TRUE.`
After modifying the source code, it can run normally.
.get_gene_names <- function(gtf, threads){ gtf.dt <- fread(gtf, sep="\t",header=F) ge <- gtf.dt[V3 == "gene"] gtf_info <- ge$V9 info_parsed <- parallel::mclapply(gtf_info, function(x){ dat <- data.table(V1=unlist(strsplit(x,"; "))) dat[,c("name","value") := tstrsplit(V1, " ")][ ,V1 := NULL][ ,value := gsub(pattern = """, replacement = "", x = value)] dat <- dat[name %in% c("gene_id","gene_name")] dat <- dcast(dat, .~name, value.var = "value") dat[,"." := NULL] }, mc.cores = threads) info_parsed <- rbindlist(info_parsed, fill=TRUE) info_parsed$gene_name[is.na(info_parsed$gene_name)] <- info_parsed$gene_id[is.na(info_parsed$gene_name)] return( info_parsed[! (is.na(gene_name) | is.na(gene_id))] ) }
Hello team!I hope this message finds you well. I've encountered an issue while working with the zUMIs output files. It appears that there is no file named
gene_names.txt
included in the zUMIs output directory. This omission is causing a problem for me because I need to create downstream analysis objects using gene names rather than ENSEMBL IDs. Could you please advise on how to resolve this issue or provide guidance on accessing gene names from the available output files?[zUMIs_output/expression] $ls
retry.dgecounts.rds retry.readcount_internal.exon.all.loom retry.readcount_internal.intron.downsampled_.loom retry.umicount.exon.downsampled_.loom retry.readcount.exon.all.loom retry.readcount_internal.exon.downsampled_.loom retry.readcount.intron.all.loom retry.umicount.inex.all.loom retry.readcount.exon.downsampled_.loom retry.readcount_internal.inex.all.loom retry.readcount.intron.downsampled_.loom retry.umicount.inex.downsampled_.loom retry.readcount.inex.all.loom retry.readcount_internal.inex.downsampled_.loom retry.rpkm.exon.all.loom retry.umicount.intron.all.loom retry.readcount.inex.downsampled_.loom retry.readcount_internal.intron.all.loom retry.umicount.exon.all.loom retry.umicount.intron.downsampled_.loom
Here's my bash script /home/wangjx/software/zUMIs/zUMIs.sh -y /home/wangjx/single_cell/abko/scripts/zUMIs_test.yaml -c Here's my run_log: `Warning: YAML file doesn't include 'pigz_exec' option; setting to 'pigz' Warning: YAML file doesn't include 'STAR_exec' option; setting to 'STAR' Warning: YAML file doesn't include 'Rscript_exec' option; setting to 'Rscript' Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file!
You provided these parameters: YAML file: /home/wangjx/single_cell/abko/scripts/zUMIs_test.yaml zUMIs directory: /home/wangjx/software/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 30 zUMIs version 2.9.7e
Thu May 30 16:17:38 CST 2024 WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a. Filtering... Thu May 30 16:44:51 CST 2024 [1] "Warning! Adaptive BC selection gave < 10 cells so I will try to use the top 100 cells!" [1] "Less than 100 barcodes present, will continue with all barcodes..." [1] " reads were assigned to barcodes that do not correspond to intact cells." Mapping... [1] "2024-05-30 16:44:58 CST" May 30 16:45:00 ..... started STAR run May 30 16:45:03 ..... loading genome May 30 16:45:50 ..... processing annotations GTF May 30 16:45:58 ..... inserting junctions into the genome indices May 30 16:47:13 ..... started 1st pass mapping May 30 17:32:32 ..... finished 1st pass mapping May 30 17:32:33 ..... inserting junctions into the genome indices May 30 17:34:00 ..... started mapping May 30 18:40:20 ..... finished mapping May 30 18:40:21 ..... finished successfully Thu May 30 18:40:22 CST 2024 Counting... [1] "2024-05-30 18:40:35 CST" [1] "1.35e+08 Reads per chunk" [1] "Loading reference annotation from:" [1] "/home/wangjx/single_cell/abko/data/20230609-prestincreer_H2BGFP_P3P4P9/zUMIs_240530/retry.final_annot.gtf" [1] "Annotation loaded!" Warning message:
as_quosure()
requires an explicit environment as of rlang 0.3.0. Please supplyenv
. This warning is displayed once per session. [1] "Assigning reads to features (ex)"\===================== http://subread.sourceforge.net/ ======================//
\===================== http://subread.sourceforge.net/ ======================//
[1] "Assigning reads to features (in)"
\===================== http://subread.sourceforge.net/ ======================//
\===================== http://subread.sourceforge.net/ ======================//
[1] "2024-05-30 18:49:23 CST" [1] "Coordinate sorting final bam file..." [bam_sort_core] merging from 24 files and 8 in-memory blocks... [1] "2024-05-30 18:57:05 CST" [1] "Here are the detected subsampling options:" [1] "Automatic downsampling" [1] "Working on barcode chunk 1 out of 2" [1] "Processing 9 barcodes in this chunk..." [1] "Working on barcode chunk 2 out of 2" [1] "Processing 1 barcodes in this chunk..." [1] "2024-05-30 19:06:17 CST" [1] "I am done!! Look what I produced.../home/wangjx/single_cell/abko/data/20230609-prestincreer_H2BGFP_P3P4P9/zUMIs_240530/zUMIs_output/" used (Mb) gc trigger (Mb) max used (Mb) Ncells 7419309 396.3 24839887 1326.6 24839887 1326.6 Vcells 57604968 439.5 685054084 5226.6 1274637199 9724.8 Thu May 30 19:06:18 CST 2024 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Thu May 30 19:06:29 CST 2024 [1] "2024-05-30 19:06:30 CST" [1] "Preparing bam file for velocyto..." [1] "2024-05-30 19:25:11 CST" [1] "Attempting to run RNA velocity..." [1] "RNA velocity done!" [1] "2024-05-31 00:26:26 CST" Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2024-05-31 00:26:27 CST" [1] "Counting UMI fragments..." notch went outside hinges. Try setting notch=FALSE. notch went outside hinges. Try setting notch=FALSE. notch went outside hinges. Try setting notch=FALSE. notch went outside hinges. Try setting notch=FALSE. notch went outside hinges. Try setting notch=FALSE. notch went outside hinges. Try setting notch=FALSE. [1] "1.35e+08 Reads per chunk" [1] "Extracting reads from bam file(s)..." [1] "Working on chunk 1" [1] "Working on chunk 2" used (Mb) gc trigger (Mb) max used (Mb) Ncells 4432338 236.8 8421155 449.8 8421155 449.8 Vcells 253142810 1931.4 1724228442 13154.9 2153933376 16433.3 Fri May 31 00:43:27 CST 2024`
Here's my YAML file to reproduce `project: retry sequence_files: file1: name: /home/wangjx/single_cell/abko/data/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/all_fastqs/reads_for_zUMIs.R1.fastq.gz base_definition: