sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
275 stars 68 forks source link

Error toward end #263

Closed Vgupta211 closed 3 years ago

Vgupta211 commented 3 years ago

Hi There,

 I'm pretty new to zUMIs, but I was able to get it to work nicely with a 10X dataset. I was hoping to use it for smartseq3 and tried it out with 8 cells that I sequenced at 1.6 mill reads. I was running this on AWS using ubuntu. I got pretty much to the end, but there appears to be something wrong with samtools...output below. 

At first I thought there was a memory issues with samtools, but that doesn't seem to be the case as I upped the memory to 64 gigs total...which should be more than enough. BAM files are generated...but it does not seem like it is able to do anything with them for some reason?

Any help would be greatly appreciated.

Best,

Vikas

Sat Jun 5 19:07:45 UTC 2021 [1] "294309 reads were assigned to barcodes that do not correspond to intact cells." [1] "Found 15 daughter barcodes that can be binned into 6 parent barcodes." [1] "Binned barcodes correspond to 6752 reads." Mapping... [1] "2021-06-05 19:07:52 UTC" Jun 05 19:07:52 ..... started STAR run Jun 05 19:07:52 ..... loading genome Jun 05 19:07:52 ..... started STAR run Jun 05 19:07:52 ..... loading genome Jun 05 19:11:10 ..... processing annotations GTF Jun 05 19:11:10 ..... processing annotations GTF Jun 05 19:11:25 ..... inserting junctions into the genome indices Jun 05 19:11:25 ..... inserting junctions into the genome indices Jun 05 19:14:51 ..... started mapping Jun 05 19:14:52 ..... started mapping Jun 05 19:15:07 ..... finished mapping Jun 05 19:15:08 ..... finished successfully Jun 05 19:15:08 ..... finished mapping Jun 05 19:15:10 ..... finished successfully Sat Jun 5 19:15:10 UTC 2021 Counting... [1] "2021-06-05 19:15:19 UTC" [1] "2.7e+08 Reads per chunk" [1] "Loading reference annotation from:" [1] "/home/ubuntu/Test/Test.final_annot.gtf" [1] "Annotation loaded!" [1] "Assigning reads to features (ex)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4
//========================== featureCounts setting ===========================\ Input files : 1 BAM file P Test.filtered.tagged.Aligned.out.bam
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)
Dir for temp files : .
Threads : 8
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1
Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file .Rsubread_UserProvidedAnnotation_pid186513 ... Features : 291534 Meta-features : 55401 Chromosomes/contigs : 22
Process BAM file Test.filtered.tagged.Aligned.out.bam...
Paired-end reads are included.
Assign alignments (paired-end) to features...
Total alignments : 1267071
Successfully assigned alignments : 885450 (69.9%)
Running time : 0.04 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4
//========================== featureCounts setting ===========================\ Input files : 1 BAM file P Test.filtered.tagged.Aligned.out.bam.ex.fe ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)
Dir for temp files : .
Threads : 8
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1
Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file .Rsubread_UserProvidedAnnotation_pid186513 ... Features : 219557 Meta-features : 28631 Chromosomes/contigs : 21
Process BAM file Test.filtered.tagged.Aligned.out.bam.ex.featureCounts ...
Paired-end reads are included.
Assign alignments (paired-end) to features...
Total alignments : 1267071
Successfully assigned alignments : 201689 (15.9%)
Running time : 0.04 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "2021-06-05 19:17:00 UTC" [1] "Coordinate sorting final bam file..." [bam_sort_core] merging from 0 files and 8 in-memory blocks... [1] "2021-06-05 19:17:03 UTC" [1] "Here are the detected subsampling options:" [1] "Automatic downsampling" [1] "Working on barcode chunk 1 out of 1" [1] "Processing 6 barcodes in this chunk..." Error in rbindlist(rsamtools_reads, fill = TRUE, use.names = TRUE) : Item 1 of input is not a data.frame, data.table or list Calls: reads2genes_new -> rbindlist In addition: Warning message: In mclapply(1:nrow(idxstats), function(x) { : all scheduled cores encountered errors in user code Execution halted Sat Jun 5 19:17:04 UTC 2021 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/ubuntu/Test/zUMIs_output/expression/Test.dgecounts.rds', probable reason 'No such file or directory' Execution halted Sat Jun 5 19:17:06 UTC 2021 Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2021-06-05 19:17:06 UTC" Error in gzfile(file, "rb") : cannot open the connection Calls: readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/ubuntu/Test/zUMIs_output/expression/Test.dgecounts.rds', probable reason 'No such file or directory' Execution halted Sat Jun 5 19:17:11 UTC 2021 yaml.pdf

sdparekh commented 3 years ago

Hi,

That is a bit odd. The featurecounts seems to have run through. Do you see a file named "Test.filtered.Aligned.GeneTagged.sorted.bam" and if so can you send a couple of lines from that file using samtools view.

Meanwhile, you may also try to revoke zUMIs from "Counting" stage by changing which_Stage: Counting in the yaml file.

Best Swati

Vgupta211 commented 3 years ago

Hi Swati,

 Odd indeed. This is the samtools view from Test.filtered.Aligned.GeneTagged.sorted.bam. I did realize that while feature counts ran, it did not generate a ex.featurecounts.bam file. I've had some stability issues with AWS, do you think you guys might be able to run the files (~150mbs) to make sure that there is not something with the setup? I'm mostly interested in seeing how many UMIs are there per cell.

Many Thanks,

Vikas

A00814:428:H23NVDMXY:1:2308:14036:29810 163 chr1 3205776 255 50M 3206179 431 CTATCAGAGGGAAGTTTTTCTTGGAAAGAGCCAGTCTTGACATGAAGCTT FFFFFFF:FFFFFFFF:FFF,FFFF:FFFFFF,F,FFFFFFFFF:FFFFF NH:i:1 HI:i:1 AS:i:76 nM:i:0 BX:Z:TTCTTGGC BC:Z:TTCTTGGC UB:Z:CAATGATG QB:Z:FFFFFFFF QU:Z:FFFFFFFF ES:Z:Assigned1 EN:i:1 GE:Z:ENSMUSG00000051951.5 IS:Z:Unassigned_NoFeatures A00814:428:H23NVDMXY:1:2308:14036:29810 83 chr1 3206179 255 28M 3205776 -431 TAAGTCACATGGTAGGAGGCTGCCTTTC FFFFFFFFFFFFFFFFFFFFFFFFF,FF NH:i:1 HI:i:1 AS:i:76 nM:i:0 BX:Z:TTCTTGGC BC:Z:TTCTTGGC UB:Z:CAATGATG QB:Z:FFFFFFFF QU:Z:FFFFFFFF ES:Z:Assigned1 EN:i:1 GE:Z:ENSMUSG00000051951.5 IS:Z:Unassigned_NoFeatures A00814:428:H23NVDMXY:1:2265:32560:19648 99 chr1 3329467 255 28M 3377782 48356 AAAAAAAAAAAAAAAAAAAAAAAAAATC FFFFFFFFFFFFFFFFFFFFFFFFFF:, NH:i:1 HI:i:1 AS:i:59 nM:i:3 BX:Z:TTCTGACC BC:Z:TTCTGACC UB:Z:GCCAAAAA QB:Z:FFFFFFFF QU:Z:FFFFFFFF ES:Z:Assigned1 EN:i:1 GE:Z:ENSMUSG00000104017.1 IS:Z:Assigned1 IN:i:1 GI:Z:ENSMUSG00000051951.5

cziegenhain commented 3 years ago

sure, just upload the data I can give it a quick run through to see if it runs ok.

Vgupta211 commented 3 years ago

Here are the files https://github.com/Vgupta211/Data

Thanks so much for your help!

Vgupta211 commented 3 years ago

I was able to get it to work! I had R 4.1 installed and when I rolled it back to 3.6..worked like a charm.

sseiler commented 3 years ago

Hello, I am currently problem-solving a very similar issue. I get through Filtering and Mapping, however fail in Counting. I'm attempting to use zUMIs with the Smart-seq3 protocol on a bulk RNA library (YAML attached) Exp4_Gradient_zUMIs.run.zip

Any recommendation here would be appreciated. Here is my Counting output:

Counting... [1] "2021-06-12 18:29:52 PDT" [1] "4.5e+08 Reads per chunk" [1] "Loading reference annotation from:" [1] "/public/groups/hausslerlab/people/sseiler/Projects/Exp4_Gradient/output/Exp4_Gradient_zUMIs.final_annot.gtf" [1] "Annotation loaded!" [1] "Assigning reads to features (ex)"

    ==========     _____ _    _ ____  _____  ______          _____
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4
//========================== featureCounts setting ===========================\ Input files : 1 BAM file P Exp4_Gradient_zUMIs.filtered.tagged.Aligne ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)
Dir for temp files : .
Threads : 20
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1
Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file .Rsubread_UserProvidedAnnotation_pid42673 ... Features : 350116 Meta-features : 60651 Chromosomes/contigs : 25
Process BAM file Exp4_Gradient_zUMIs.filtered.tagged.Aligned.out.bam...
Paired-end reads are included.
Assign alignments (paired-end) to features...
Total alignments : 61814175
Successfully assigned alignments : 4548341 (7.4%)
Running time : 2.05 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

    ==========     _____ _    _ ____  _____  ______          _____
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4
//========================== featureCounts setting ===========================\ Input files : 1 BAM file P Exp4_Gradient_zUMIs.filtered.tagged.Aligne ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)
Dir for temp files : .
Threads : 20
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1
Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file .Rsubread_UserProvidedAnnotation_pid42673 ... Features : 240696 Meta-features : 28364 Chromosomes/contigs : 24
Process BAM file Exp4_Gradient_zUMIs.filtered.tagged.Aligned.out.bam.e ...
Paired-end reads are included.
Assign alignments (paired-end) to features...
Total alignments : 61814175
Successfully assigned alignments : 9454047 (15.3%)
Running time : 2.36 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "2021-06-12 18:40:04 PDT" [1] "Coordinate sorting final bam file..." [bam_sort_core] merging from 0 files and 20 in-memory blocks... [1] "2021-06-12 18:53:13 PDT" [1] "Here are the detected subsampling options:" [1] "Automatic downsampling" [1] "Working on barcode chunk 1 out of 1" [1] "Processing 39 barcodes in this chunk..." Error in rbindlist(rsamtools_reads, fill = TRUE, use.names = TRUE) : Item 1 of input is not a data.frame, data.table or list Calls: reads2genes_new -> rbindlist In addition: Warning message: In mclapply(1:nrow(idxstats), function(x) { : all scheduled cores encountered errors in user code Execution halted Sat Jun 12 18:53:20 PDT 2021 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/public/groups/hausslerlab/people/sseiler/Projects/Exp4_Gradient/output/zUMIs_output/expression/Exp4_Gradient_zUMIs.dgecounts.rds', probable reason 'No such file or directory' Execution halted Sat Jun 12 18:53:27 PDT 2021 Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2021-06-12 18:53:28 PDT" Error in gzfile(file, "rb") : cannot open the connection Calls: readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/public/groups/hausslerlab/people/sseiler/Projects/Exp4_Gradient/output/zUMIs_output/expression/Exp4_Gradient_zUMIs.dgecounts.rds', probable reason 'No such file or directory' Execution halted Sat Jun 12 18:53:52 PDT 2021

sdparekh commented 3 years ago

Thank you Vgupta for the update. If R version is an issue, we will test and adapt the dependencies in the next release. Good to know your issue is resolved.

@sseiler: Please check if the R version is an issue at your end as well.

sseiler commented 3 years ago

Yes @sdparekh. I was also running R-4.1.0 which brings the error. Per recommendation, I rolled back to R-3.6.0 and Counting succeed. An easy way to roll back is to rely on the build-in conda defaults. After Filtering and Mapping, run:

zUMIs/zUMIs.sh -c -y zUMIs.yaml

to complete the Counting.