sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
273 stars 67 forks source link

zUMIs didn't create gene_names.txt in the output folder #396

Closed JinsyWang closed 4 months ago

JinsyWang commented 4 months ago

Hello team!I hope this message finds you well. I've encountered an issue while working with the zUMIs output files. It appears that there is no file named gene_names.txt included in the zUMIs output directory. This omission is causing a problem for me because I need to create downstream analysis objects using gene names rather than ENSEMBL IDs. Could you please advise on how to resolve this issue or provide guidance on accessing gene names from the available output files?

[zUMIs_output/expression] $ls retry.dgecounts.rds retry.readcount_internal.exon.all.loom retry.readcount_internal.intron.downsampled_.loom retry.umicount.exon.downsampled_.loom retry.readcount.exon.all.loom retry.readcount_internal.exon.downsampled_.loom retry.readcount.intron.all.loom retry.umicount.inex.all.loom retry.readcount.exon.downsampled_.loom retry.readcount_internal.inex.all.loom retry.readcount.intron.downsampled_.loom retry.umicount.inex.downsampled_.loom retry.readcount.inex.all.loom retry.readcount_internal.inex.downsampled_.loom retry.rpkm.exon.all.loom retry.umicount.intron.all.loom retry.readcount.inex.downsampled_.loom retry.readcount_internal.intron.all.loom retry.umicount.exon.all.loom retry.umicount.intron.downsampled_.loom

Here's my bash script /home/wangjx/software/zUMIs/zUMIs.sh -y /home/wangjx/single_cell/abko/scripts/zUMIs_test.yaml -c Here's my run_log: `Warning: YAML file doesn't include 'pigz_exec' option; setting to 'pigz' Warning: YAML file doesn't include 'STAR_exec' option; setting to 'STAR' Warning: YAML file doesn't include 'Rscript_exec' option; setting to 'Rscript' Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file!

You provided these parameters: YAML file: /home/wangjx/single_cell/abko/scripts/zUMIs_test.yaml zUMIs directory: /home/wangjx/software/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 30 zUMIs version 2.9.7e

Thu May 30 16:17:38 CST 2024 WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a. Filtering... Thu May 30 16:44:51 CST 2024 [1] "Warning! Adaptive BC selection gave < 10 cells so I will try to use the top 100 cells!" [1] "Less than 100 barcodes present, will continue with all barcodes..." [1] " reads were assigned to barcodes that do not correspond to intact cells." Mapping... [1] "2024-05-30 16:44:58 CST" May 30 16:45:00 ..... started STAR run May 30 16:45:03 ..... loading genome May 30 16:45:50 ..... processing annotations GTF May 30 16:45:58 ..... inserting junctions into the genome indices May 30 16:47:13 ..... started 1st pass mapping May 30 17:32:32 ..... finished 1st pass mapping May 30 17:32:33 ..... inserting junctions into the genome indices May 30 17:34:00 ..... started mapping May 30 18:40:20 ..... finished mapping May 30 18:40:21 ..... finished successfully Thu May 30 18:40:22 CST 2024 Counting... [1] "2024-05-30 18:40:35 CST" [1] "1.35e+08 Reads per chunk" [1] "Loading reference annotation from:" [1] "/home/wangjx/single_cell/abko/data/20230609-prestincreer_H2BGFP_P3P4P9/zUMIs_240530/retry.final_annot.gtf" [1] "Annotation loaded!" Warning message: as_quosure() requires an explicit environment as of rlang 0.3.0. Please supply env. This warning is displayed once per session. [1] "Assigning reads to features (ex)"

    ==========     _____ _    _ ____  _____  ______          _____
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4
//========================== featureCounts setting ===========================\ Input files : 1 BAM file P retry.filtered.tagged.Aligned.out.bam
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)
Dir for temp files : .
Threads : 8
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1
Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file .Rsubread_UserProvidedAnnotation_pid18197 ... Features : 295108 Meta-features : 56941 Chromosomes/contigs : 39
Process BAM file retry.filtered.tagged.Aligned.out.bam...
Paired-end reads are included.
Assign alignments (paired-end) to features...
WARNING: reads from the same pair were found not adjacent to each
other in the input (due to read sorting by location or
reporting of multi-mapping read pairs).
Pairing up the read pairs.
Total alignments : 142440667
Successfully assigned alignments : 105921632 (74.4%)
Running time : 3.78 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

    ==========     _____ _    _ ____  _____  ______          _____
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4
//========================== featureCounts setting ===========================\ Input files : 1 BAM file P retry.filtered.tagged.Aligned.out.bam.ex.f ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)
Dir for temp files : .
Threads : 8
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1
Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file .Rsubread_UserProvidedAnnotation_pid18197 ... Features : 221006 Meta-features : 28989 Chromosomes/contigs : 32
Process BAM file retry.filtered.tagged.Aligned.out.bam.ex.featureCount ...
Paired-end reads are included.
Assign alignments (paired-end) to features...
WARNING: reads from the same pair were found not adjacent to each
other in the input (due to read sorting by location or
reporting of multi-mapping read pairs).
Pairing up the read pairs.
Total alignments : 142440667
Successfully assigned alignments : 19380315 (13.6%)
Running time : 3.83 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "2024-05-30 18:49:23 CST" [1] "Coordinate sorting final bam file..." [bam_sort_core] merging from 24 files and 8 in-memory blocks... [1] "2024-05-30 18:57:05 CST" [1] "Here are the detected subsampling options:" [1] "Automatic downsampling" [1] "Working on barcode chunk 1 out of 2" [1] "Processing 9 barcodes in this chunk..." [1] "Working on barcode chunk 2 out of 2" [1] "Processing 1 barcodes in this chunk..." [1] "2024-05-30 19:06:17 CST" [1] "I am done!! Look what I produced.../home/wangjx/single_cell/abko/data/20230609-prestincreer_H2BGFP_P3P4P9/zUMIs_240530/zUMIs_output/" used (Mb) gc trigger (Mb) max used (Mb) Ncells 7419309 396.3 24839887 1326.6 24839887 1326.6 Vcells 57604968 439.5 685054084 5226.6 1274637199 9724.8 Thu May 30 19:06:18 CST 2024 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns This is to maintain compatibility with other loom tools |======================================================================| 100%Thu May 30 19:06:29 CST 2024 [1] "2024-05-30 19:06:30 CST" [1] "Preparing bam file for velocyto..." [1] "2024-05-30 19:25:11 CST" [1] "Attempting to run RNA velocity..." [1] "RNA velocity done!" [1] "2024-05-31 00:26:26 CST" Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2024-05-31 00:26:27 CST" [1] "Counting UMI fragments..." notch went outside hinges. Try setting notch=FALSE. notch went outside hinges. Try setting notch=FALSE. notch went outside hinges. Try setting notch=FALSE. notch went outside hinges. Try setting notch=FALSE. notch went outside hinges. Try setting notch=FALSE. notch went outside hinges. Try setting notch=FALSE. [1] "1.35e+08 Reads per chunk" [1] "Extracting reads from bam file(s)..." [1] "Working on chunk 1" [1] "Working on chunk 2" used (Mb) gc trigger (Mb) max used (Mb) Ncells 4432338 236.8 8421155 449.8 8421155 449.8 Vcells 253142810 1931.4 1724228442 13154.9 2153933376 16433.3 Fri May 31 00:43:27 CST 2024`

Here's my YAML file to reproduce `project: retry sequence_files: file1: name: /home/wangjx/single_cell/abko/data/20230609-prestincreer_H2BGFP_P3P4P9/00.CleanData/all_fastqs/reads_for_zUMIs.R1.fastq.gz base_definition:

JinsyWang commented 4 months ago

Solved. Error was found in the rbindlist function while running runFeatureCount.R.

`> info_parsed[3035:3040] [[1]] gene_id gene_name 1: ENSMUSG00000101940 Akp-ps1

[[2]] gene_id 1: ENSMUSG00000121351

[[3]] gene_id gene_name 1: ENSMUSG00000102817 Gm6185

[[4]] gene_id 1: ENSMUSG00000121369

[[5]] gene_id gene_name 1: ENSMUSG00000096964 Gm6345

[[6]] gene_id gene_name 1: ENSMUSG00000121407 Gm6345

info_parsed <- rbindlist(info_parsed) Error in rbindlist(info_parsed) : Item 3036 has 1 columns, inconsistent with item 1 which has 2 columns. To fill missing columns use fill=TRUE.`

After modifying the source code, it can run normally.

.get_gene_names <- function(gtf, threads){ gtf.dt <- fread(gtf, sep="\t",header=F) ge <- gtf.dt[V3 == "gene"] gtf_info <- ge$V9 info_parsed <- parallel::mclapply(gtf_info, function(x){ dat <- data.table(V1=unlist(strsplit(x,"; "))) dat[,c("name","value") := tstrsplit(V1, " ")][ ,V1 := NULL][ ,value := gsub(pattern = """, replacement = "", x = value)] dat <- dat[name %in% c("gene_id","gene_name")] dat <- dcast(dat, .~name, value.var = "value") dat[,"." := NULL] }, mc.cores = threads) info_parsed <- rbindlist(info_parsed, fill=TRUE) info_parsed$gene_name[is.na(info_parsed$gene_name)] <- info_parsed$gene_id[is.na(info_parsed$gene_name)] return( info_parsed[! (is.na(gene_name) | is.na(gene_id))] ) }