sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
269 stars 67 forks source link

Met Execution problems during running the example data. #315

Closed RFLiu2021 closed 2 years ago

RFLiu2021 commented 2 years ago

HI, zUMIs team guys,

I installed and run the kit several times, but always met the same problem. I installed using ' git clone https://github.com/sdparekh/zUMIs.git'. Then run the example data with the internal environment: './zUMIs.sh -c -y zUMIz_example.yaml' . The system is CentOS 7.6.

The printed information is as following:

%----------------------------------------- [root@localhost zUMIs]# ./zUMIs.sh -c -y zUMIz_example.yaml Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file!

You provided these parameters: YAML file: zUMIz_example.yaml zUMIs directory: /root/Application/zUMIs/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: null zUMIs version 2.9.7

Tue Apr 19 14:38:38 CST 2022 WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a. Filtering... Tue Apr 19 14:38:43 CST 2022 Error in data.table::fread(bccount_file, header = FALSE, col.names = c("XC", : File './outputDir/Example.BCstats.txt' does not exist or is non-readable. getwd()=='/root/Application/zUMIs/zUMIs/outputDir' Calls: cellBC -> Execution halted Mapping... [1] "2022-04-19 14:38:44 CST" Warning message: NAs introduced by coercion Apr 19 14:38:45 ..... started STAR run Apr 19 14:38:45 ..... loading genome Apr 19 14:38:45 ..... started STAR run Apr 19 14:38:45 ..... loading genome Apr 19 14:38:45 ..... started STAR run Apr 19 14:38:45 ..... loading genome Apr 19 14:38:45 ..... started STAR run Apr 19 14:38:45 ..... loading genome Apr 19 14:38:45 ..... processing annotations GTF Apr 19 14:38:45 ..... processing annotations GTF Apr 19 14:38:45 ..... processing annotations GTF Apr 19 14:38:45 ..... processing annotations GTF Apr 19 14:38:45 ..... inserting junctions into the genome indices Apr 19 14:38:45 ..... inserting junctions into the genome indices Apr 19 14:38:45 ..... inserting junctions into the genome indices Apr 19 14:38:45 ..... inserting junctions into the genome indices Apr 19 14:38:48 ..... started 1st pass mapping Apr 19 14:38:48 ..... started 1st pass mapping Apr 19 14:38:48 ..... started 1st pass mapping Apr 19 14:38:48 ..... started 1st pass mapping Apr 19 14:39:25 ..... finished 1st pass mapping Apr 19 14:39:25 ..... inserting junctions into the genome indices Apr 19 14:39:27 ..... started mapping Apr 19 14:39:54 ..... finished 1st pass mapping Apr 19 14:39:54 ..... inserting junctions into the genome indices Apr 19 14:39:56 ..... finished 1st pass mapping Apr 19 14:39:56 ..... inserting junctions into the genome indices Apr 19 14:39:56 ..... started mapping Apr 19 14:39:57 ..... finished 1st pass mapping Apr 19 14:39:57 ..... inserting junctions into the genome indices Apr 19 14:39:59 ..... started mapping Apr 19 14:40:00 ..... started mapping Apr 19 14:40:06 ..... finished mapping Apr 19 14:40:06 ..... finished successfully Apr 19 14:41:06 ..... finished mapping Apr 19 14:41:06 ..... finished successfully Apr 19 14:41:08 ..... finished mapping Apr 19 14:41:08 ..... finished successfully Apr 19 14:41:12 ..... finished mapping Apr 19 14:41:12 ..... finished successfully Tue Apr 19 14:41:13 CST 2022 Counting... [1] "2022-04-19 14:41:20 CST" Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes_binned.txt")) : File './outputDir/zUMIs_output/Examplekept_barcodes_binned.txt' does not exist or is non-readable. getwd()=='/root/Application/zUMIs/zUMIs/outputDir' Execution halted Tue Apr 19 14:41:20 CST 2022 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file './outputDir/zUMIs_output/expression/Example.dgecounts.rds', probable reason 'No such file or directory' Execution halted Tue Apr 19 14:41:22 CST 2022 Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2022-04-19 14:41:22 CST" Error in fread(gtf, select = 1:2, header = F) : File './outputDir/Example.final_annot.gtf' does not exist or is non-readable. getwd()=='/root/Application/zUMIs/zUMIs/outputDir' Calls: getUserSeq -> fread Execution halted Tue Apr 19 14:41:26 CST 2022 [root@localhost zUMIs]# %---------------------------------------------------------

I also checked the first file failed to be found: './outputDir/Example.BCstats.txt' The output as following: % --------------------------------------------------------- [root@localhost zUMIs]# head ./outputDir/Example.BCstats.txt AAACTC 1 GCAGGA 2 AAGCAC 1 ATTGAT 1 GGGCGT 1 TCGTGA 1 GATTGC 1 CCTCGA 2 ACTAAA 500 GAAATT 2 [root@localhost zUMIs]# %-------------------------------------------------------------- So, I guess there is problem at here?

Thanks anyway.

sdparekh commented 2 years ago

Can you please share your yaml file.

RFLiu2021 commented 2 years ago

The yaml file:

%------------------------ ###########################################

Welcome to zUMIs

below, please fill the mandatory inputs

We expect full paths for all files.

###########################################

define a project name that will be used to name output files

project: Example

Sequencing File Inputs:

For each input file, make one list object & define path and barcode ranges

base definition vocabulary: BC(n) UMI(n) cDNA(n).

Barcode range definition needs to account for all ranges. You can give several comma-separated ranges for BC & UMI sequences, eg. BC(1-6,20-26)

you can specify between 1 and 4 input files

sequence_files: file1: name: ../data_exmple/barcoderead_HEK.1mio.fq.gz base_definition: BC(1-6)

file2: name: ../data_exmple/cDNAread_HEK.1mio.fq.gz base_definition: cDNA(1-50)

reference genome setup

reference: STAR_index: ../ref_tem/chr22/hg38_chr22_STAR7 GTF_file: ../ref_tem/chr22/GRCh38.95.chr22.gtf exon_extension: no #extend exons by a certain width? extension_length: 0 #number of bp to extend exons by scaffold_length_min: 0 #minimal scaffold/chromosome length to consider (0 = all) additional_files: #Optional parameter. It is possible to give additional reference sequences here, eg ERCC.fa additional_STAR_params: #Optional parameter. you may add custom mapping parameters to STAR here

output directory

out_dir: ./outputDir

###########################################

below, you may optionally change default parameters

###########################################

number of processors to use

num_threads: 10 mem_limit: null #Memory limit in Gigabytes, null meaning unlimited RAM usage.

barcode & UMI filtering options

number of bases under the base quality cutoff that should be filtered out.

Phred score base-cutoff for quality control.

filter_cutoffs: BC_filter: num_bases: 1 phred: 20 UMI_filter: num_bases: 1 phred: 20

Options for Barcode handling

You can give either number of top barcodes to use or give an annotation of cell barcodes.

If you leave both barcode_num and barcode_file empty, zUMIs will perform automatic cell barcode selection for you!

barcodes: barcode_num: null barcode_file: null barcode_sharing: null #Optional for combining several barcode sequences per cell (see github wiki) automatic: yes #Give yes/no to this option. If the cell barcodes should be detected automatically. If the barcode file is given in combination with automatic barcode detection, the list of given barcodes will be used as whitelist. BarcodeBinning: 1 #Hamming distance binning of close cell barcode sequences. nReadsperCell: 100 #Keep only the cell barcodes with atleast n number of reads. demultiplex: no #produce per-cell demultiplexed bam files.

Options related to counting of reads towards expression profiles

counting_opts: introns: yes #can be set to no for exon-only counting. intronProb: no #perform an estimation of how likely intronic reads are to be derived from mRNA by comparing to intergenic counts. downsampling: 0 #Number of reads to downsample to. This value can be a fixed number of reads (e.g. 10000) or a desired range (e.g. 10000-20000) Barcodes with less than will not be reported. 0 means adaptive downsampling. Default: 0. strand: 0 #Is the library stranded? 0 = unstranded, 1 = positively stranded, 2 = negatively stranded Ham_Dist: 0 #Hamming distance collapsing of UMI sequences. velocyto: no #Would you like velocyto to do counting of intron-exon spanning reads primaryHit: yes #Do you want to count the primary Hits of multimapping reads towards gene expression levels? multi_overlap: no #Do you want to assign reads overlapping to multiple features? fraction_overlap: 0 #minimum required fraction of the read overlapping with the gene for read assignment to genes twoPass: yes #perform basic STAR twoPass mapping

produce stats files and plots?

make_stats: yes

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

which_Stage: Filtering

define dependencies program paths

samtools_exec: samtools #samtools executable Rscript_exec: Rscript #Rscript executable STAR_exec: STAR #STAR executable pigz_exec: pigz #pigz executable

below, fqfilter will add a read_layout flag defining SE or PE

%------------------------

Thanks

cziegenhain commented 2 years ago

Your issue could be due to the use of relative paths. For all files and folders please use a full absolute path.

RFLiu2021 commented 2 years ago

Thanks a lot! It is done!