ghoshal commented 6 years ago

Hi,

I am getting an error while running zUMIs on split seq data. The program gets stuck at this particular step.

bash zUMIs-master.sh -y /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/zUMIs_0078.yaml

You provided these parameters: YAML file: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/zUMIs_0078.yaml zUMIs directory: /Users/biba/Downloads/softwares/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: null zUMIs version 2.0.6

Filtering... split: illegal option -- - usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern] [file [prefix]] split: illegal option -- - usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern] [file [prefix]] /Users/biba/Downloads/softwares/zUMIs/splitfq.sh: line 41: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/sample_0078/zUMIs_output/.tmpMerge///Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/BC_0078_R1.fastq.gz.listPrefix.txt: No such file or directory /Users/biba/Downloads/softwares/zUMIs/splitfq.sh: line 41: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/sample_0078/zUMIs_output/.tmpMerge///Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/BC_0078_R1.fastq.gz.listPrefix.txt: No such file or directory ls: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/sample_0078/zUMIs_output/.tmpMerge/BC_0078_R1.fastq: No such file or directory ls: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/sample_0078/zUMIs_output/.tmpMerge/BC_0078_R1.fastq: No such file or directory ls: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/sample_0078/zUMIs_output/.tmpMerge/BC_0078_R1.fastq*: No such file or directory /Users/biba/Downloads/softwares/zUMIs/mergeBAM.sh: line 14: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/sample_0078/zUMIs_output/.tmpMerge///Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/sample_0078.bamlist.txt: No such file or directory /Users/biba/Downloads/softwares/zUMIs/mergeBAM.sh: line 17: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/zUMIs_0078.yaml//Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/sample_0078.BCstats.txt: Not a directory head: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/sample_0078/zUMIs_output/.tmpMerge///Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/sample_0078.bamlist.txt: No such file or directory

Here is my YAML file.

###########################################

Welcome to zUMIs

below, please fill the mandatory inputs

We expect full paths for all files.

###########################################

define a project name that will be used to name output files

project:sample_0078

Sequencing File Inputs:

For each input file, make one list object & define path and barcode ranges

base definition vocabulary: BC(n) UMI(n) cDNA(n).

Barcode range definition needs to account for all ranges. You can give several comma-separated ranges for BC & UMI sequences, eg. BC(1-6,20-26)

you can specify between 1 and 4 input files

sequence_files: file1: name: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/BC_0078_R1.fastq.gz base_definition:

cDNA(1-66) file2: name: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/BC_0078_R1.fastq.gz base_definition:
UMI(1-10)
BC(11-18, 49-56, 87-94)

reference genome setup

reference: STAR_index: /Users/biba/Downloads/softwares/mm10/star_indices GTF_file: /Users/biba/Downloads/softwares/mm10/annotation/Mus_musculus.GRCm38.77.gtf additional_files: #Optional parameter. It is possible to give additional reference sequences here, eg ERCC.fa additional_STAR_params: #Optional parameter. you may add custom mapping parameters to STAR here

output directory

out_dir: /Volumes/Promise_Pegasus_New/Single_cell_seq/September_2018/Seda/samples/sample_0078

###########################################

below, you may optionally change default parameters

###########################################

number of processors to use

num_threads: 16 mem_limit: null #Memory limit in Gigabytes, null meaning unlimited RAM usage.

barcode & UMI filtering options

number of bases under the base quality cutoff that should be filtered out.

Phred score base-cutoff for quality control.

filter_cutoffs: BC_filter: num_bases: 2 phred: 10 UMI_filter: num_bases: 2 phred: 10

Options for Barcode handling

You can give either number of top barcodes to use or give an annotation of cell barcodes.

If you leave both barcode_num and barcode_file empty, zUMIs will perform automatic cell barcode selection for you!

barcodes: barcode_num: null barcode_file: null BarcodeBinning: 0 #Hamming distance binning of close cell barcode sequences. ATTENTION! This option is currently not implemented! nReadsperCell: 50 #Keep only the cell barcodes with atleast n number of reads.

Options related to counting of reads towards expression profiles

counting_opts: introns: yes #can be set to no for exon-only counting. downsampling: 0 #Number of reads to downsample to. This value can be a fixed number of reads (e.g. 10000) or a desired range (e.g. 10000-20000) Barcodes with less than will not be reported. 0 means adaptive downsampling. Default: 0. strand: 0 #Is the library stranded? 0 = unstranded, 1 = positively stranded, 2 = negatively stranded Ham_Dist: 0 #Hamming distance collapsing of UMI sequences. velocyto: no #Would you like velocyto-compatible counting of intron-exon spanning reads primaryHit: yes #Do you want to count the primary Hits of multimapping reads towards gene expression levels? twoPass: yes #perform basic STAR twoPass mapping

produce stats files and plots?

make_stats: yes

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

which_Stage: Filtering

below, the fqfilter will add a read_layout flag defining SE or PE

samtools_exec: samtools pigz_exec: pigz Rscript_exec: Rscript STAR_exec: STAR

zUMIs_directory: /Users/biba/Downloads/softwares/zUMIs

Any help would be awesome.

Thanks,

Bibaswan

cziegenhain commented 6 years ago

Hey Bibaswan,

This looks like issue #70. What OS are you trying to run this on?

ghoshal commented 6 years ago

Hi,

I am running the program on Mac OSX Mojave.

Thanks,

Bibaswan

On Thu, Sep 20, 2018, 1:26 AM cziegenhain, notifications@github.com wrote:

Hey Bibaswan,

This looks like issue #70 https://github.com/sdparekh/zUMIs/issues/70. What OS are you trying to run this on?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sdparekh/zUMIs/issues/72#issuecomment-423044385, or mute the thread https://github.com/notifications/unsubscribe-auth/AKyzlf429J24Rapm_LM3S0PmLF1FYD7Nks5ucycXgaJpZM4WxA7r .

cziegenhain commented 6 years ago

Hey Bibaswan,

sorry, Mac OS is neither tested nor supported by zUMIs. I guess that here, the Mac OS split command lacks the necessary functionalities. Maybe a EC2 instance on AWS could be helpful for you?

Best, Christoph

ghoshal commented 6 years ago

Hi Christoph,

I will try this in a local cluster environment and see.

Thanks,

Bibaswan

cziegenhain commented 6 years ago

Excellent, let us know if you encounter further issues.

sdparekh / zUMIs

Error running zUMIs #72

Welcome to zUMIs

below, please fill the mandatory inputs

We expect full paths for all files.

define a project name that will be used to name output files

Sequencing File Inputs:

For each input file, make one list object & define path and barcode ranges

base definition vocabulary: BC(n) UMI(n) cDNA(n).

Barcode range definition needs to account for all ranges. You can give several comma-separated ranges for BC & UMI sequences, eg. BC(1-6,20-26)

you can specify between 1 and 4 input files

reference genome setup

output directory

below, you may optionally change default parameters

number of processors to use

barcode & UMI filtering options

number of bases under the base quality cutoff that should be filtered out.

Phred score base-cutoff for quality control.

Options for Barcode handling

You can give either number of top barcodes to use or give an annotation of cell barcodes.

If you leave both barcode_num and barcode_file empty, zUMIs will perform automatic cell barcode selection for you!

Options related to counting of reads towards expression profiles

produce stats files and plots?

Start zUMIs from stage. Possible TEXT(Filtering, Mapping, Counting, Summarising). Default: Filtering.

below, the fqfilter will add a read_layout flag defining SE or PE