Closed hmassalha closed 5 years ago
Hey,
For me it looks like things start to go wrong while STAR is loading the reference genome. Could you make sure that you are giving the correct path & have sufficient memory for loading available? The other error messages are just downstream problems of no useful mapped file being present.
Apart from that, I see that you are using a quite old version. I can recommend to update to our newest zUMIs release for many new features and even faster processing :)
Best, Christoph
Hi, Thanks for your reply. I did check the paths and the files that I am using for this analysis. They are OK. I looked into the log.out file and find the following: `EXITING because of fatal PARAMETERS error: present --sjdbOverhang=65 is not equal to the value at the genome generation step =100 SOLUTION:
Nov 25 06:47:21 ...... FATAL ERROR, exiting`
hope this note will give more hits to solve the problem. Thanks, HM
Hi, I managed to find the problem and solve it. Since I am using an old bash code that worked for me, the problem was from the cluster of the institute. Solved and everything working as expected. Again thanks for your help, and will update my zUMIs :)
Best, HM
Hey HM,
alright great! Was just about to answer here to check the STAR version but you managed to solve it already. Hope everything goes smooth for you now.
Best,
Christoph
Hi Christoph, I am facing the same problem (the same error) that I posted in my first comment, however, this time I am submitting the zUMIs jobs using a loop from the terminal, and I have dgecounts.rds files for part of my jobs. For the other files, I am getting only the annotationsSAF.rds files as an output. The bam files for those ones are empty. Do you have any suggestions where to start looking for what happened?
Thanks in advance, HM
Hey,
Since you do get output from some of the jobs, I would assume the problem lies with the files that break. Couple of suggestions:
Best, Christoph
Thanks for your fast reply, I appreciate your help and Merry Christmas.
The zUMIs paths are OK. I used the same code when I analyzed MARSseq data.
I am analyzing msSCRBseq reads using the following command for bcl2fastq
bsub -J $projName"bcl" -q new-short -R rusage[mem=8000] -n 16 bcl2fastq -R $inputPath$runName --output-dir $outputPath -p 16 --no-lane-splitting --mask-short-adapter-reads 5 --barcode-mismatches 1 --minimum-trimmed-read-length 14 --sample-sheet ${metaDataPath}"samplesheet_"${projName}".csv"
Here are the outputs that I getting:
here is an output for a job that failed:
`Job <181219_villiStromaZonationLCM_3> was submitted from host
Your job looked like:
Successfully completed.
Resource usage summary:
CPU time : 949.47 sec.
Max Memory : 8000 MB
Average Memory : 257.51 MB
Total Requested Memory : 8000.00 MB
Delta Memory : 0.00 MB
Max Swap : -
Max Processes : 7
Max Threads : 14
Run time : 971 sec.
Turnaround time : 1440 sec.
The output (if any) follows:
Your jobs will run on this machine.
Make sure you have more than 25G RAM and 1 processors available.
Your jobs will be started from filtering.
You provided these parameters: SLURM workload manager: no Summary Stats to produce: yes Start the pipeline from: filtering A custom mapped BAM: NA Custom filtered FASTQ: no Barcode read: /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AHYL7HBGX7_181219_villiStromaZonationLCM_output/villiStromaZonationLCM/N719_c3_S3_R1_001.fastq.gz cDNA read: /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AHYL7HBGX7_181219_villiStromaZonationLCM_output/villiStromaZonationLCM/N719_c3_S3_R2_001.fastq.gz Study/sample name: N719_c3 Output directory: /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AHYL7HBGX7_181219_villiStromaZonationLCM_output/villiStromaZonationLCM/zUMI_strandS1_node Cell/sample barcode range: 1-6 UMI barcode range: 7-16 Retain cell with >=N reads: 100 Genome directory: /home/labs/shalev/NGS/indexes/GRCm38.84_STAR_zumi/ GTF annotation file: /home/labs/shalev/NGS/indexes/GRCm38.84/Mus_musculus.GRCm38.84.gtf Number of processors: 1 Read length: 66 Strandedness: 1 Cell barcode Phred: 20 UMI barcode Phred: 20
Hamming Distance (UMI): 0
Hamming Distance (CellBC): 1
Plate Barcode Read: NA
Plate Barcode range: NA
Barcodes: /home/labs/shalev/hassanm/NGS/indexes/scrb_barcode_32.txt
zUMIs directory: /home/labs/shalev/hassanm/NGS/zUMI/zUMI006
STAR executable STAR
samtools executable samtools
pigz executable pigz
Rscript executable Rscript
Additional STAR parameters:
STRT-seq data: no
InDrops data: no
Library read for InDrops: NA
Barcode read2(STRT-seq): NA
Barcode read2 range(STRT-seq): 0-0
Bases(G) to trim(STRT-seq): 3
Subsampling reads: 0
zUMIs version 0.0.6c
Raw reads: 17480767 Filtered reads: 14231713
Make sure you have approximately 14062 Mb RAM availableDec 25 12:00:28 ..... started STAR run Dec 25 12:00:28 ..... loading genome /home/labs/shalev/hassanm/NGS/zUMI/zUMI006/zUMIs-noslurm.sh: line 91: 17995 Bus error (core dumped) $starexc --genomeDir $g --runThreadN $t --readFilesCommand zcat --sjdbGTFfile $gtf --outFileNamePrefix $o/$sn. --outSAMtype BAM Unsorted --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --sjdbOverhang $rl --twopassMode Basic --readFilesIn $o/$sn.cdnaread.filtered.fastq.gz $x Loading required package: optparse [1] "I am loading useful packages..." [1] "2018-12-25 12:02:10 IST" [1] "I am making annotations in SAF... This will take less than 3 minutes..." [1] "2018-12-25 12:02:20 IST" Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In .get_cds_IDX(type, phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. 'select()' returned 1:many mapping between keys and columns [1] "I am making count tables...This will take a while!!" [1] "2018-12-25 12:03:31 IST"
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 1.28.1
//========================== featureCounts setting ===========================\ | Input files : 1 BAM file | S /home/labs/shalev/hassanm/NGS/181219_NB551 ... | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Dir for temp files : . | ||||||||||
Threads : 1 | ||||||||||
Level : meta-feature level | ||||||||||
Paired-end : no | ||||||||||
Strand specific : stranded | ||||||||||
Multimapping reads : primary only | ||||||||||
Multi-overlapping reads : not counted | ||||||||||
Min overlapping bases : 1 | ||||||||||
\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\ | Load annotation file ./.Rsubread_UserProvidedAnnotation_pid19441 ... | Features : 225068 | Meta-features : 25257 | Chromosomes/contigs : 38 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Process BAM file /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AH ... | ||||||||||||||||||
Single-end reads are included. | ||||||||||||||||||
Assign reads to features... | ||||||||||||||||||
Total reads : 0 | ||||||||||||||||||
Successfully assigned reads : 0 | ||||||||||||||||||
Running time : 0.00 minutes | ||||||||||||||||||
Read assignment finished. | ||||||||||||||||||
\===================== http://subread.sourceforge.net/ ======================//
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 1.28.1
//========================== featureCounts setting ===========================\ | Input files : 1 BAM file | S /home/labs/shalev/hassanm/NGS/181219_NB551 ... | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Dir for temp files : . | ||||||||||
Threads : 1 | ||||||||||
Level : meta-feature level | ||||||||||
Paired-end : no | ||||||||||
Strand specific : stranded | ||||||||||
Multimapping reads : primary only | ||||||||||
Multi-overlapping reads : not counted | ||||||||||
Min overlapping bases : 1 | ||||||||||
\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\ | Load annotation file ./.Rsubread_UserProvidedAnnotation_pid19441 ... | Features : 710016 | Meta-features : 42143 | Chromosomes/contigs : 45 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Process BAM file /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AH ... | ||||||||||||||||||
Single-end reads are included. | ||||||||||||||||||
Assign reads to features... | ||||||||||||||||||
Total reads : 0 | ||||||||||||||||||
Successfully assigned reads : 0 | ||||||||||||||||||
Running time : 0.00 minutes | ||||||||||||||||||
Read assignment finished. | ||||||||||||||||||
\===================== http://subread.sourceforge.net/ ======================//
Error in data.table::fread(paste("cut -f4 ", abamfile[2], ".featureCounts", :
File is empty: /dev/shm/file4bf14f0476f
Calls: makeGEprofile ->
Thanks, HM
Hey,
I am pretty sure that you need to request more memory for your job submission system! Loading the human genome for STAR would require at least ~25 to 30 GB RAM! Thats why your job dies when loading the SA for STAR. (Although I am unsure why this works for some of your jobs then)
Happy holidays!
EDIT: also I just re-read your bcl2fastq: for mcSCRB-seq, if you have --minimum-trimmed-read-length 14
that means some of your barcode reads may miss 2 bases of the UMI! can you set that to 16 and try?
Thanks aging, I will rerun the analysis for the files that didn't work and update you. Best, HM
Hi Christoph, I did what you suggested and I do get dgecounts.rds files for most of my libraries. I have two pools of 6 samples each, and for them, I am getting the following message: 'TERM_RUNLIMIT: job killed after reaching LSF run time limit.' What would you suggest for me to do?
Thanks again for your help, HM
Hey,
We are getting closer to the cause here, seems like zUMIs is not breaking but getting killed from your load management system.
I am not familiar with the exact job scheduling you are using on your cluster, but a quick google tells me that you probably need to increase a higher Job time when submitting your job (bsub -W
): https://www.ibm.com/support/knowledgecenter/en/SSETD4_9.1.3/lsf_command_ref/bsub.__w.1.html
The runtime for zUMIs depends linearly on the number of reads in the data, so I would recommend to scale that!
Last suggestion a bit unrelated: It seems to me like you are using bcl2fastq to demultiplex (sub-)libraries pooled by an Illumina index and run zUMIs for each of them. You can also get fastq files without that and use the index read as an additional barcode read in zUMIs. That way you only need to run zUMIs once, it should be much faster! Let me know if you want more details on this.
Best, Christoph
Hey, You are correct, I do have a high number of reads for my samples.
I will be happy if you can give me more details with regard to use the Illumina barcode directly into the zUMI. Sure it will be much faster, this is because zUMI has to load the genome-related files only once. Could you please share with me the relevant bcl2fastq and the zUMIs command?
Thanks, HM
Of course, I'd be happy to!
--create-fastq-for-index-reads
. --no-lane-splitting
.-T Undetermined_I1.fastq.gz
) and give the appropriate base range (-U 1-8
).Best, Christoph
Much appreciated. I will try it let you know what I will get. Best HM
Dear Christoph, I just downloaded the new zUMI to our cluster in the lab. I bulit the yaml file based on the instructions that you have in github. Unfortunalty I keep getting the following message:
'Job
Your job looked like:
Successfully completed.
Resource usage summary:
CPU time : 0.19 sec.
Max Memory : 2 MB
Average Memory : 2.00 MB
Total Requested Memory : 24000.00 MB
Delta Memory : 23998.00 MB
Max Swap : -
Max Processes : 4
Max Threads : 5
Run time : 7 sec.
Turnaround time : 0 sec.
The output (if any) follows:
tee: '/home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output' /zUMIs_runlog.txt: No such file or directory
You provided these parameters: YAML file: /home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output/yaml.yaml zUMIs directory: /home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 2 zUMIs version 2.2.2b
mkdir: cannot create directory ‘'/home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output'\r/zUMIs_output/’: No such file or directory mkdir: cannot create directory ‘'/home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output'\r/zUMIs_output/expression’: No such file or directory mkdir: cannot create directory ‘'/home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output'\r/zUMIs_output/stats’: No such file or directory mkdir: cannot create directory ‘'/home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output'\r/zUMIs_output/.tmpMerge’: No such file or directory'
Any suggestions, please? I check the paths and they are OK. Just a small reminder, I would like to make the Illumina demultiplexing using zUMI and not bcl2fastq, and the -U flag is not useful anymore with the latest version of zUMI.
Thanks in advance, HM
Hi HM,
It looks like you have special characters like "\r" in your path. Please check the encoding of your script and yaml files.
For your second question, you can provide illumina index reads as file3 and use the range BC(1-8) or however long your read is. That way the sample barcode becomes illumina index + Sample BC.
I hope this helps.
Good luck Best, Swati
Hi Swati, Sorry for my late reply, I was busy with another project. I tried to understand why I have this "\r", I don't have in my line in code neither the ymal that includes this hidden character '\r'. Do you have any suggestion how I can go around this issue?
Thanks, HM
Hi, I am using the same bash code that worked for me in the past with zUMIs. However, now I am not getting the dgcounts.rds files, although I got a 'Successfully completed' message. When looked deeply in the output files I see two things:
`Raw reads: 27276018 Filtered reads: 12754354
Make sure you have approximately 13778 Mb RAM availableNov 22 17:19:33 ..... started STAR run Nov 22 17:19:33 ..... loading genome /home/labs/shalev/hassanm/NGS/zUMI/zUMI006/zUMIs-noslurm.sh: line 91: 287599 Bus error (core dumped) $starexc --genomeDir $g --runThreadN $t --readFilesCommand zcat --sjdbGTFfile $gtf --outFileNamePrefix $o/$sn. --outSAMtype BAM Unsorted --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --sjdbOverhang $rl --twopassMode Basic --readFilesIn $o/$sn.cdnaread.filtered.fastq.gz $x [bam_sort_core] merging from 0 files and 16 in-memory blocks... Loading required package: optparse [1] "I am loading useful packages..." [1] "2018-11-22 17:20:49 IST" [1] "I am making annotations in SAF... This will take less than 3 minutes..." [1] "2018-11-22 17:21:01 IST" Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In .get_cds_IDX(type, phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. 'select()' returned 1:many mapping between keys and columns [1] "I am making count tables...This will take a while!!" [1] "2018-11-22 17:22:01 IST"`
**Error in data.table::fread(paste("cut -f4 ", abamfile[2], ".featureCounts", : File is empty: /dev/shm/file4fe366000b61d** Calls: makeGEprofile -> <Anonymous> Execution halted [1] "I am loading useful packages for plotting..." [1] "2018-11-22 17:22:06 IST" Error in gzfile(file, "rb") : cannot open the connection Calls: readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/labs/shalev/kerenb/NGS/181121_NB501465_0409_AHVLG7BGX7_Acinar_Telocytes_Hassan_output/Mouse_Telocytes/zUMI_strandS1/zUMIs_output/expression/F3_big_p1.dgecounts.rds', probable reason 'No such file or directory' Execution halted
in the folder of the zUMIs_output I see that Aligned.out.bam, aligned.sorted.bam, ex.featureCounts and in.featureCounts files are 0 bits. hope this can give a hit to what happned..
Thanks, HM