sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
273 stars 67 forks source link

Bus error #92

Closed hmassalha closed 5 years ago

hmassalha commented 5 years ago

Hi, I am using the same bash code that worked for me in the past with zUMIs. However, now I am not getting the dgcounts.rds files, although I got a 'Successfully completed' message. When looked deeply in the output files I see two things:

  1. `Raw reads: 27276018 Filtered reads: 12754354

Make sure you have approximately 13778 Mb RAM availableNov 22 17:19:33 ..... started STAR run Nov 22 17:19:33 ..... loading genome /home/labs/shalev/hassanm/NGS/zUMI/zUMI006/zUMIs-noslurm.sh: line 91: 287599 Bus error (core dumped) $starexc --genomeDir $g --runThreadN $t --readFilesCommand zcat --sjdbGTFfile $gtf --outFileNamePrefix $o/$sn. --outSAMtype BAM Unsorted --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --sjdbOverhang $rl --twopassMode Basic --readFilesIn $o/$sn.cdnaread.filtered.fastq.gz $x [bam_sort_core] merging from 0 files and 16 in-memory blocks... Loading required package: optparse [1] "I am loading useful packages..." [1] "2018-11-22 17:20:49 IST" [1] "I am making annotations in SAF... This will take less than 3 minutes..." [1] "2018-11-22 17:21:01 IST" Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In .get_cds_IDX(type, phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. 'select()' returned 1:many mapping between keys and columns [1] "I am making count tables...This will take a while!!" [1] "2018-11-22 17:22:01 IST"`

  1. **Error in data.table::fread(paste("cut -f4 ", abamfile[2], ".featureCounts", : File is empty: /dev/shm/file4fe366000b61d** Calls: makeGEprofile -> <Anonymous> Execution halted [1] "I am loading useful packages for plotting..." [1] "2018-11-22 17:22:06 IST" Error in gzfile(file, "rb") : cannot open the connection Calls: readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/labs/shalev/kerenb/NGS/181121_NB501465_0409_AHVLG7BGX7_Acinar_Telocytes_Hassan_output/Mouse_Telocytes/zUMI_strandS1/zUMIs_output/expression/F3_big_p1.dgecounts.rds', probable reason 'No such file or directory' Execution halted

in the folder of the zUMIs_output I see that Aligned.out.bam, aligned.sorted.bam, ex.featureCounts and in.featureCounts files are 0 bits. hope this can give a hit to what happned..

Thanks, HM

cziegenhain commented 5 years ago

Hey,

For me it looks like things start to go wrong while STAR is loading the reference genome. Could you make sure that you are giving the correct path & have sufficient memory for loading available? The other error messages are just downstream problems of no useful mapped file being present.

Apart from that, I see that you are using a quite old version. I can recommend to update to our newest zUMIs release for many new features and even faster processing :)

Best, Christoph

hmassalha commented 5 years ago

Hi, Thanks for your reply. I did check the paths and the files that I am using for this analysis. They are OK. I looked into the log.out file and find the following: `EXITING because of fatal PARAMETERS error: present --sjdbOverhang=65 is not equal to the value at the genome generation step =100 SOLUTION:

Nov 25 06:47:21 ...... FATAL ERROR, exiting`

hope this note will give more hits to solve the problem. Thanks, HM

hmassalha commented 5 years ago

Hi, I managed to find the problem and solve it. Since I am using an old bash code that worked for me, the problem was from the cluster of the institute. Solved and everything working as expected. Again thanks for your help, and will update my zUMIs :)

Best, HM

cziegenhain commented 5 years ago

Hey HM,

alright great! Was just about to answer here to check the STAR version but you managed to solve it already. Hope everything goes smooth for you now.

Best,

Christoph

hmassalha commented 5 years ago

Hi Christoph, I am facing the same problem (the same error) that I posted in my first comment, however, this time I am submitting the zUMIs jobs using a loop from the terminal, and I have dgecounts.rds files for part of my jobs. For the other files, I am getting only the annotationsSAF.rds files as an output. The bam files for those ones are empty. Do you have any suggestions where to start looking for what happened?

Thanks in advance, HM

cziegenhain commented 5 years ago

Hey,

Since you do get output from some of the jobs, I would assume the problem lies with the files that break. Couple of suggestions:

Best, Christoph

hmassalha commented 5 years ago

Thanks for your fast reply, I appreciate your help and Merry Christmas.

  1. The zUMIs paths are OK. I used the same code when I analyzed MARSseq data.

  2. I am analyzing msSCRBseq reads using the following command for bcl2fastq bsub -J $projName"bcl" -q new-short -R rusage[mem=8000] -n 16 bcl2fastq -R $inputPath$runName --output-dir $outputPath -p 16 --no-lane-splitting --mask-short-adapter-reads 5 --barcode-mismatches 1 --minimum-trimmed-read-length 14 --sample-sheet ${metaDataPath}"samplesheet_"${projName}".csv"

  3. Here are the outputs that I getting:

    • aligned.sorted.bam - empty
    • aligned.sorted.bam.ex.featureCounts - empty
    • aligned.sorted.bam.in.featureCounts - empty
    • barcodelist.filtered.sort.sam - not empty
    • Log.out - the last line is 'Loading SA ...' when comparing to good run.
    • Log.progress.out - empty

here is an output for a job that failed: `Job <181219_villiStromaZonationLCM_3> was submitted from host by user in cluster at Tue Dec 25 11:39:47 2018 Job was executed on host(s) , in queue , as user in cluster at Tue Dec 25 11:40:18 2018 </home/labs/shalev/hassanm> was used as the home directory. </home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AHYL7HBGX7_181219_villiStromaZonationLCM_output/villiStromaZonationLCM> was used as the working directory. Started at Tue Dec 25 11:40:18 2018 Terminated at Tue Dec 25 12:03:47 2018 Results reported at Tue Dec 25 12:03:47 2018

Your job looked like:


LSBATCH: User input

bash /home/labs/shalev/hassanm/NGS/zUMI/zUMI006/zUMIs-master.sh -f /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AHYL7HBGX7_181219_villiStromaZonationLCM_output/villiStromaZonationLCM/N719_c3_S3_R1_001.fastq.gz -r /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AHYL7HBGX7_181219_villiStromaZonationLCM_output/villiStromaZonationLCM/N719_c3_S3_R2_001.fastq.gz -c 1-6 -m 7-16 -l 66 -g /home/labs/shalev/NGS/indexes/GRCm38.84_STAR_zumi/ -b /home/labs/shalev/hassanm/NGS/indexes/scrb_barcode_32.txt -a /home/labs/shalev/NGS/indexes/GRCm38.84/Mus_musculus.GRCm38.84.gtf -n N719_c3 -B 1 -s 1 -o /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AHYL7HBGX7_181219_villiStromaZonationLCM_output/villiStromaZonationLCM/zUMI_strandS1_node -i

Successfully completed.

Resource usage summary:

CPU time :                                   949.47 sec.
Max Memory :                                 8000 MB
Average Memory :                             257.51 MB
Total Requested Memory :                     8000.00 MB
Delta Memory :                               0.00 MB
Max Swap :                                   -
Max Processes :                              7
Max Threads :                                14
Run time :                                   971 sec.
Turnaround time :                            1440 sec.

The output (if any) follows:

Your jobs will run on this machine.

Make sure you have more than 25G RAM and 1 processors available.

Your jobs will be started from filtering.

You provided these parameters: SLURM workload manager: no Summary Stats to produce: yes Start the pipeline from: filtering A custom mapped BAM: NA Custom filtered FASTQ: no Barcode read: /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AHYL7HBGX7_181219_villiStromaZonationLCM_output/villiStromaZonationLCM/N719_c3_S3_R1_001.fastq.gz cDNA read: /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AHYL7HBGX7_181219_villiStromaZonationLCM_output/villiStromaZonationLCM/N719_c3_S3_R2_001.fastq.gz Study/sample name: N719_c3 Output directory: /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AHYL7HBGX7_181219_villiStromaZonationLCM_output/villiStromaZonationLCM/zUMI_strandS1_node Cell/sample barcode range: 1-6 UMI barcode range: 7-16 Retain cell with >=N reads: 100 Genome directory: /home/labs/shalev/NGS/indexes/GRCm38.84_STAR_zumi/ GTF annotation file: /home/labs/shalev/NGS/indexes/GRCm38.84/Mus_musculus.GRCm38.84.gtf Number of processors: 1 Read length: 66 Strandedness: 1 Cell barcode Phred: 20 UMI barcode Phred: 20

bases below phred in CellBC: 1

bases below phred in UMI: 1

Hamming Distance (UMI): 0 Hamming Distance (CellBC): 1 Plate Barcode Read: NA Plate Barcode range: NA Barcodes: /home/labs/shalev/hassanm/NGS/indexes/scrb_barcode_32.txt zUMIs directory: /home/labs/shalev/hassanm/NGS/zUMI/zUMI006 STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript Additional STAR parameters:
STRT-seq data: no InDrops data: no Library read for InDrops: NA Barcode read2(STRT-seq): NA Barcode read2 range(STRT-seq): 0-0 Bases(G) to trim(STRT-seq): 3 Subsampling reads: 0

zUMIs version 0.0.6c

Raw reads: 17480767 Filtered reads: 14231713

Make sure you have approximately 14062 Mb RAM availableDec 25 12:00:28 ..... started STAR run Dec 25 12:00:28 ..... loading genome /home/labs/shalev/hassanm/NGS/zUMI/zUMI006/zUMIs-noslurm.sh: line 91: 17995 Bus error (core dumped) $starexc --genomeDir $g --runThreadN $t --readFilesCommand zcat --sjdbGTFfile $gtf --outFileNamePrefix $o/$sn. --outSAMtype BAM Unsorted --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --sjdbOverhang $rl --twopassMode Basic --readFilesIn $o/$sn.cdnaread.filtered.fastq.gz $x Loading required package: optparse [1] "I am loading useful packages..." [1] "2018-12-25 12:02:10 IST" [1] "I am making annotations in SAF... This will take less than 3 minutes..." [1] "2018-12-25 12:02:20 IST" Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In .get_cds_IDX(type, phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. 'select()' returned 1:many mapping between keys and columns [1] "I am making count tables...This will take a while!!" [1] "2018-12-25 12:03:31 IST"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.28.1
//========================== featureCounts setting ===========================\ Input files : 1 BAM file S /home/labs/shalev/hassanm/NGS/181219_NB551 ...
Dir for temp files : .
Threads : 1
Level : meta-feature level
Paired-end : no
Strand specific : stranded
Multimapping reads : primary only
Multi-overlapping reads : not counted
Min overlapping bases : 1

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file ./.Rsubread_UserProvidedAnnotation_pid19441 ... Features : 225068 Meta-features : 25257 Chromosomes/contigs : 38
Process BAM file /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AH ...
Single-end reads are included.
Assign reads to features...
Total reads : 0
Successfully assigned reads : 0
Running time : 0.00 minutes
Read assignment finished.

\===================== http://subread.sourceforge.net/ ======================//

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.28.1
//========================== featureCounts setting ===========================\ Input files : 1 BAM file S /home/labs/shalev/hassanm/NGS/181219_NB551 ...
Dir for temp files : .
Threads : 1
Level : meta-feature level
Paired-end : no
Strand specific : stranded
Multimapping reads : primary only
Multi-overlapping reads : not counted
Min overlapping bases : 1

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file ./.Rsubread_UserProvidedAnnotation_pid19441 ... Features : 710016 Meta-features : 42143 Chromosomes/contigs : 45
Process BAM file /home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AH ...
Single-end reads are included.
Assign reads to features...
Total reads : 0
Successfully assigned reads : 0
Running time : 0.00 minutes
Read assignment finished.

\===================== http://subread.sourceforge.net/ ======================//

Error in data.table::fread(paste("cut -f4 ", abamfile[2], ".featureCounts", : File is empty: /dev/shm/file4bf14f0476f Calls: makeGEprofile -> Execution halted [1] "I am loading useful packages for plotting..." [1] "2018-12-25 12:03:38 IST" Error in gzfile(file, "rb") : cannot open the connection Calls: readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/labs/shalev/hassanm/NGS/181219_NB551168_0251_AHYL7HBGX7_181219_villiStromaZonationLCM_output/villiStromaZonationLCM/zUMI_strandS1_node/zUMIs_output/expression/N719_c3.dgecounts.rds', probable reason 'No such file or directory' Execution halted`

Thanks, HM

cziegenhain commented 5 years ago

Hey,

I am pretty sure that you need to request more memory for your job submission system! Loading the human genome for STAR would require at least ~25 to 30 GB RAM! Thats why your job dies when loading the SA for STAR. (Although I am unsure why this works for some of your jobs then)

Happy holidays!

EDIT: also I just re-read your bcl2fastq: for mcSCRB-seq, if you have --minimum-trimmed-read-length 14 that means some of your barcode reads may miss 2 bases of the UMI! can you set that to 16 and try?

hmassalha commented 5 years ago

Thanks aging, I will rerun the analysis for the files that didn't work and update you. Best, HM

hmassalha commented 5 years ago

Hi Christoph, I did what you suggested and I do get dgecounts.rds files for most of my libraries. I have two pools of 6 samples each, and for them, I am getting the following message: 'TERM_RUNLIMIT: job killed after reaching LSF run time limit.' What would you suggest for me to do?

Thanks again for your help, HM

cziegenhain commented 5 years ago

Hey,

We are getting closer to the cause here, seems like zUMIs is not breaking but getting killed from your load management system. I am not familiar with the exact job scheduling you are using on your cluster, but a quick google tells me that you probably need to increase a higher Job time when submitting your job (bsub -W): https://www.ibm.com/support/knowledgecenter/en/SSETD4_9.1.3/lsf_command_ref/bsub.__w.1.html

The runtime for zUMIs depends linearly on the number of reads in the data, so I would recommend to scale that!

Last suggestion a bit unrelated: It seems to me like you are using bcl2fastq to demultiplex (sub-)libraries pooled by an Illumina index and run zUMIs for each of them. You can also get fastq files without that and use the index read as an additional barcode read in zUMIs. That way you only need to run zUMIs once, it should be much faster! Let me know if you want more details on this.

Best, Christoph

hmassalha commented 5 years ago

Hey, You are correct, I do have a high number of reads for my samples.

I will be happy if you can give me more details with regard to use the Illumina barcode directly into the zUMI. Sure it will be much faster, this is because zUMI has to load the genome-related files only once. Could you please share with me the relevant bcl2fastq and the zUMIs command?

Thanks, HM

cziegenhain commented 5 years ago

Of course, I'd be happy to!

Best, Christoph

hmassalha commented 5 years ago

Much appreciated. I will try it let you know what I will get. Best HM

hmassalha commented 5 years ago

Dear Christoph, I just downloaded the new zUMI to our cluster in the lab. I bulit the yaml file based on the instructions that you have in github. Unfortunalty I keep getting the following message:

'Job was submitted from host by user in cluster at Fri Jan 11 12:38:42 2019 Job was executed on host(s) <12*cn472.wexac.weizmann.ac.il>, in queue , as user in cluster at Fri Jan 11 12:38:37 2019 </home/labs/shalev/hassanm> was used as the home directory. </home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output> was used as the working directory. Started at Fri Jan 11 12:38:37 2019 Terminated at Fri Jan 11 12:38:38 2019 Results reported at Fri Jan 11 12:38:38 2019

Your job looked like:


LSBATCH: User input

bash zUMIs-master.sh -y /home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output/yaml.yaml

Successfully completed.

Resource usage summary:

CPU time :                                   0.19 sec.
Max Memory :                                 2 MB
Average Memory :                             2.00 MB
Total Requested Memory :                     24000.00 MB
Delta Memory :                               23998.00 MB
Max Swap :                                   -
Max Processes :                              4
Max Threads :                                5
Run time :                                   7 sec.
Turnaround time :                            0 sec.

The output (if any) follows:

tee: '/home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output' /zUMIs_runlog.txt: No such file or directory

You provided these parameters: YAML file: /home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output/yaml.yaml zUMIs directory: /home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 2 zUMIs version 2.2.2b

mkdir: cannot create directory ‘'/home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output'\r/zUMIs_output/’: No such file or directory mkdir: cannot create directory ‘'/home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output'\r/zUMIs_output/expression’: No such file or directory mkdir: cannot create directory ‘'/home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output'\r/zUMIs_output/stats’: No such file or directory mkdir: cannot create directory ‘'/home/labs/shalev/hassanm/NGS/190109_NB501465_0443_AHJ2WYBGX9_human_zonation_zUMI181029_output'\r/zUMIs_output/.tmpMerge’: No such file or directory'

Any suggestions, please? I check the paths and they are OK. Just a small reminder, I would like to make the Illumina demultiplexing using zUMI and not bcl2fastq, and the -U flag is not useful anymore with the latest version of zUMI.

Thanks in advance, HM

sdparekh commented 5 years ago

Hi HM,

It looks like you have special characters like "\r" in your path. Please check the encoding of your script and yaml files.

For your second question, you can provide illumina index reads as file3 and use the range BC(1-8) or however long your read is. That way the sample barcode becomes illumina index + Sample BC.

I hope this helps.

Good luck Best, Swati

hmassalha commented 5 years ago

Hi Swati, Sorry for my late reply, I was busy with another project. I tried to understand why I have this "\r", I don't have in my line in code neither the ymal that includes this hidden character '\r'. Do you have any suggestion how I can go around this issue?

Thanks, HM