sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
275 stars 67 forks source link

ERROR: input is not BAM or CRAM #217

Closed arshiyaakeel closed 4 years ago

arshiyaakeel commented 4 years ago

Hi Christoph,  

To test zUMIs pipeline locally, I used fastq files from your case study (5 samples, SAMN07206142 to SAMN07206138) and generated the following yaml file. 

project: My_YAML sequence_files:   file1:     name: /home/projects/cu_10158/people/shakha/zUMIs/rawData/reads_for_zUMIs.R1.fastq.gz     base_definition: ()   file2:     name: /home/projects/cu_10158/people/shakha/zUMIs/rawData/reads_for_zUMIs.R2.fastq.gz     base_definition: () reference:   STAR_index: /home/projects/cu_10158/people/shakha/zUMIs/index_output   GTF_file: /home/projects/cu_10158/people/shakha/cellRanger/opt/refdata-cellranger-GRCh38-3.0.0/fasta/   additional_STAR_params: ''   additional_files: ~ out_dir: /home/projects/cu_10158/people/shakha/zUMIs num_threads: 8 mem_limit: 0 filter_cutoffs:   BC_filter:     num_bases: 1     phred: 20   UMI_filter:     num_bases: 1     phred: 20 barcodes:   barcode_num: ~   barcode_file: ~   automatic: yes   BarcodeBinning: 0   nReadsperCell: 100 counting_opts:   introns: yes   downsampling: '0'   strand: 0   Ham_Dist: 0   velocyto: no   primaryHit: yes   twoPass: yes make_stats: yes which_Stage: Filtering Rscript_exec: Rscript STAR_exec: STAR pigz_exec: pigz samtools_exec: samtools

Then I run the script zUMIs.sh and got the following error:  [shakha@g-12-l0001 ~]$ zUMIs.sh -y /home/projects/cu_10158/people/shakha/zUMIs/My_YAML.yaml


 Good news! A newer version of zUMIs is available at https://github.com/sdparekh/zUMIs


/services/tools/zumis/2.9.4/zUMIs.sh: line 171: My_YAML.zUMIs_YAMLerror.log: No space left on device tail: cannot open \u2018My_YAML.zUMIs_YAMLerror.log\u2019 for reading: No such file or directory

 You provided these parameters:  YAML file: /home/projects/cu_10158/people/shakha/zUMIs/My_YAML.yaml  zUMIs directory: /services/tools/zumis/2.9.4  STAR executable STAR  samtools executable samtools  pigz executable pigz  Rscript executable Rscript  RAM limit:   0  zUMIs version 2.9.4

Tue Oct  6 09:54:05 CEST 2020 WARNING: The STAR version used for mapping is 2.6.0c and the STAR index was created using the version 20201. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.6.0c. Filtering... Tue Oct  6 09:54:59 CEST 2020 Error in uik(bccount$cellindex, bccount$cs/1000) :   Method is not applicable for such a small vector. Please give at least a 5 numbers vector Calls: cellBC -> .cellBarcode_unknown -> .FindBCcut -> uik Execution halted Mapping... [1] "2020-10-06 09:55:00 CEST" cp: omitting directory \u2018/home/projects/cu_10158/people/shakha/cellRanger/opt/refdata-cellranger-GRCh38-3.0.0/fasta/\u2019

EXITING because of fatal PARAMETERS error: pGe.sjdbOverhang <=0 while junctions are inserted on the fly with --sjdbFileChrStartEnd or/and --sjdbGTFfile SOLUTION: specify pGe.sjdbOverhang>0, ideally readmateLength-1 Oct 06 09:55:00 ...... FATAL ERROR, exiting

EXITING because of fatal PARAMETERS error: pGe.sjdbOverhang <=0 while junctions are inserted on the fly with --sjdbFileChrStartEnd or/and --sjdbGTFfile SOLUTION: specify pGe.sjdbOverhang>0, ideally readmateLength-1 Oct 06 09:55:00 ...... FATAL ERROR, exiting [main_cat] ERROR: input is not BAM or CRAM [main_cat] ERROR: input is not BAM or CRAM Tue Oct  6 09:55:00 CEST 2020 Counting... [1] "2020-10-06 09:55:08 CEST" Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes.txt")) :   File '/home/projects/cu_10158/people/shakha/zUMIs/zUMIs_output/My_YAMLkept_barcodes.txt' does not exist or is non-readable. getwd()=='/home/projects/cu_10158/people/shakha/zUMIs' Execution halted Tue Oct  6 09:55:09 CEST 2020 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") :   cannot open compressed file '/home/projects/cu_10158/people/shakha/zUMIs/zUMIs_output/expression/My_YAML.dgecounts.rds', probable reason 'No such file or directory' Execution halted Tue Oct  6 09:55:13 CEST 2020 Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2020-10-06 09:55:14 CEST" Error in fread(gtf, select = 1:2, header = F) :   File '/home/projects/cu_10158/people/shakha/zUMIs/My_YAML.final_annot.gtf' does not exist or is non-readable. getwd()=='/home/projects/cu_10158/people/shakha/zUMIs' Calls: getUserSeq -> fread Execution halted Tue Oct  6 09:55:18 CEST 2020

Could you please make some comments on it? Thank you very much in advance.  Best regards Arshiya--

sdparekh commented 4 years ago

Hi Arshiya,

You must specify the base definitions of your fastq files. Please refer to an example yaml file or use the shiny app to create one.

Also, the GTF file is missing. You have specify the file and not just the path to the file.

Best, Swati

arshiyaakeel commented 4 years ago

Hello Swati, 

Thank you for your response. I regenerated YMAL file with base definitions and by including full path of GTF file.  project: My_YAML

sequence_files:   file1:     name: /home/projects/cu_10158/people/shakha/zUMIs/rawData/reads_for_zUMIs.R1.fastq.gz     base_definition: cDNA(1-45)   file2:     name: /home/projects/cu_10158/people/shakha/zUMIs/rawData/reads_for_zUMIs.R2.fastq.gz     base_definition: cDNA(1-45) reference:   STAR_index: /home/projects/cu_10158/people/shakha/zUMIs/index_output   GTF_file: /home/projects/cu_10158/people/shakha/cellRanger/opt/refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf   additional_STAR_params: ''   additional_files: ~ out_dir: /home/projects/cu_10158/people/shakha/zUMIs num_threads: 8 mem_limit: 0 filter_cutoffs:   BC_filter:     num_bases: 1     phred: 20   UMI_filter:     num_bases: 1     phred: 20 barcodes:   barcode_num: ~   barcode_file: ~   automatic: yes   BarcodeBinning: 0   nReadsperCell: 100 counting_opts:   introns: yes   downsampling: '0'   strand: 0   Ham_Dist: 0   velocyto: no   primaryHit: yes   twoPass: yes make_stats: yes which_Stage: Filtering Rscript_exec: Rscript STAR_exec: STAR pigz_exec: pigz samtools_exec: samtools

But still not able to run (Error traceback below)

shakha@g-12-l0001 ~]$ zUMIs.sh -y /home/projects/cu_10158/people/shakha/zUMIs/My_YAML.yaml


 Good news! A newer version of zUMIs is available at https://github.com/sdparekh/zUMIs


rm: write error: No space left on device

 You provided these parameters:  YAML file: /home/projects/cu_10158/people/shakha/zUMIs/My_YAML.yaml  zUMIs directory: /services/tools/zumis/2.9.4  STAR executable STAR  samtools executable samtools  pigz executable pigz  Rscript executable Rscript  RAM limit:   0  zUMIs version 2.9.4

Tue Oct  6 15:19:23 CEST 2020 WARNING: The STAR version used for mapping is 2.6.0c and the STAR index was created using the version 20201. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.6.0c. Filtering... Tue Oct  6 15:20:23 CEST 2020 Error in uik(bccount$cellindex, bccount$cs/1000) :   Method is not applicable for such a small vector. Please give at least a 5 numbers vector Calls: cellBC -> .cellBarcode_unknown -> .FindBCcut -> uik Execution halted Mapping... [1] "2020-10-06 15:20:24 CEST" Oct 06 15:20:27 ..... started STAR run Oct 06 15:20:27 ..... loading genome

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=15 is not equal to the value at the genome generation step =100 SOLUTION:

Oct 06 15:20:27 ...... FATAL ERROR, exiting Oct 06 15:20:27 ..... started STAR run Oct 06 15:20:27 ..... loading genome

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=15 is not equal to the value at the genome generation step =100 SOLUTION:

Oct 06 15:20:27 ...... FATAL ERROR, exiting [main_cat] ERROR: input is not BAM or CRAM [main_cat] ERROR: input is not BAM or CRAM Tue Oct  6 15:20:28 CEST 2020 Counting... [1] "2020-10-06 15:20:36 CEST" Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes.txt")) :   File '/home/projects/cu_10158/people/shakha/zUMIs/zUMIs_output/My_YAMLkept_barcodes.txt' does not exist or is non-readable. getwd()=='/home/projects/cu_10158/people/shakha/zUMIs' Execution halted Tue Oct  6 15:20:36 CEST 2020 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") :   cannot open compressed file '/home/projects/cu_10158/people/shakha/zUMIs/zUMIs_output/expression/My_YAML.dgecounts.rds', probable reason 'No such file or directory' Execution halted Tue Oct  6 15:20:41 CEST 2020 Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2020-10-06 15:20:41 CEST" Warning message: In fread(gtf, select = 1:2, header = F) :   Found and resolved improper quoting in first 100 rows. If the fields are not quoted (e.g. field separator does not appear within any field), try quote="" to avoid this warning. Error in data.table::fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project,  :   File '/home/projects/cu_10158/people/shakha/zUMIs/zUMIs_output/My_YAMLkept_barcodes.txt' does not exist or is non-readable. getwd()=='/home/projects/cu_10158/people/shakha/zUMIs' Execution halted Tue Oct  6 15:21:22 CEST 2020

Kindly note that I am testing with the case study (5 samples, SAMN07206142 to SAMN07206138) provided in your main research article.Could you please check if any input is still missing?

Thanks in advance.

Kind regards Arshiya