Closed kbattenb closed 2 years ago
Hey Kai,
Thanks for having such a nice and complete error description, much appreciated!
My initial gut feeling for this issue: I think the SRA read IDs have been problematic before, could you try to do fastq-dump in this way: https://github.com/sdparekh/zUMIs/wiki/Reprocessing-of-public-data
Best, Christoph
Hi Christoph,
Thank you for your suggestion. Apparently it's not the first time SRA has caused issues and I should have looked into that.
I will re-download these files as per your suggestion and let you know if the situation improves.
Thank you again.
All the best,
Kai Battenberg
Hi Christoph,
I tried your suggestion and I believe it got further in the process, but it still did not complete.
Input data:
The options for the command by which the data was download was changed
From this:
fastq-dump --split-files --gzip --accession SRR9621775
To this:
fastq-dump --split-files --origfmt --defline-qual '+' --gzip --accession SRR9621775
As a result (as suggested), each read in the FASTQ file was changed
From this: @SRR9621775.1 D00224L:270:CCU0CANXX:6:2211:1220:2085 length=76 ATGCTACTGCAAATTCTAGAATTGTGAGTAGAAGTAAAATAATAAATGTAATGGTAGCTGTTGGTGGGCTAATATT +SRR9621775.1 D00224L:270:CCU0CANXX:6:2211:1220:2085 length=76 BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
To this: @D00224L:270:CCU0CANXX:6:2211:1220:2085 ATGCTACTGCAAATTCTAGAATTGTGAGTAGAAGTAAAATAATAAATGTAATGGTAGCTGTTGGTGGGCTAATATT + BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
# /home/zUMIs/zUMIs.sh -c -y /home/workbench/config_files/config_for_SmartSeq2.yaml Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file!
You provided these parameters: YAML file: /home/workbench/config_files/config_for_SmartSeq2.yaml zUMIs directory: /home/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: null zUMIs version 2.9.7
Fri Apr 15 08:08:56 JST 2022 WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.4a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a. Filtering... Fri Apr 15 11:29:58 JST 2022 [1] "3752 barcodes detected." [1] "5699177 reads were assigned to barcodes that do not correspond to intact cells." Mapping... [1] "2022-04-15 11:30:32 JST" STAR --readFilesCommand samtools view -@ 2 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /home/workbench/Reference/Mouse --sjdbGTFfile /home/workbench/Reference/Mus_musculus.GRCm39.105.gtf --runThreadN 2 --readFilesType SAM PE --genomeSAindexNbases 11 --limitOutSJcollapsed 5000000 --twopassMode Basic --readFilesIn /home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2aj.filtered.tagged.bam,/home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ak.filtered.tagged.bam --outFileNamePrefix /home/workbench/OUTPUT/zUMIs_output/.tmpMap//tmp.MouseSmartSeq2.4. STAR version: 2.7.9a compiled: 2021-07-01T11:54:56+09:00 a524ed1d99de:/home/STAR-2.7.9a/source Apr 15 11:30:51 ..... started STAR run Apr 15 11:30:51 ..... loading genome STAR --readFilesCommand samtools view -@ 2 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /home/workbench/Reference/Mouse --sjdbGTFfile /home/workbench/Reference/Mus_musculus.GRCm39.105.gtf --runThreadN 2 --readFilesType SAM PE --genomeSAindexNbases 11 --limitOutSJcollapsed 5000000 --twopassMode Basic --readFilesIn /home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2aa.filtered.tagged.bam,/home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ab.filtered.tagged.bam,/home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ac.filtered.tagged.bam --outFileNamePrefix /home/workbench/OUTPUT/zUMIs_output/.tmpMap//tmp.MouseSmartSeq2.1. STAR version: 2.7.9a compiled: 2021-07-01T11:54:56+09:00 a524ed1d99de:/home/STAR-2.7.9a/source Apr 15 11:30:51 ..... started STAR run Apr 15 11:30:51 ..... loading genome STAR --readFilesCommand samtools view -@ 2 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /home/workbench/Reference/Mouse --sjdbGTFfile /home/workbench/Reference/Mus_musculus.GRCm39.105.gtf --runThreadN 2 --readFilesType SAM PE --genomeSAindexNbases 11 --limitOutSJcollapsed 5000000 --twopassMode Basic --readFilesIn /home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ad.filtered.tagged.bam,/home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ae.filtered.tagged.bam,/home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2af.filtered.tagged.bam --outFileNamePrefix /home/workbench/OUTPUT/zUMIs_output/.tmpMap//tmp.MouseSmartSeq2.2. STAR version: 2.7.9a compiled: 2021-07-01T11:54:56+09:00 a524ed1d99de:/home/STAR-2.7.9a/source Apr 15 11:30:51 ..... started STAR run Apr 15 11:30:51 ..... loading genome STAR --readFilesCommand samtools view -@ 2 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /home/workbench/Reference/Mouse --sjdbGTFfile /home/workbench/Reference/Mus_musculus.GRCm39.105.gtf --runThreadN 2 --readFilesType SAM PE --genomeSAindexNbases 11 --limitOutSJcollapsed 5000000 --twopassMode Basic --readFilesIn /home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ag.filtered.tagged.bam,/home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ah.filtered.tagged.bam,/home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ai.filtered.tagged.bam --outFileNamePrefix /home/workbench/OUTPUT/zUMIs_output/.tmpMap//tmp.MouseSmartSeq2.3. STAR version: 2.7.9a compiled: 2021-07-01T11:54:56+09:00 a524ed1d99de:/home/STAR-2.7.9a/source Apr 15 11:30:51 ..... started STAR run Apr 15 11:30:51 ..... loading genome Apr 15 12:16:48 ..... processing annotations GTF Apr 15 12:16:48 ..... processing annotations GTF Apr 15 12:29:14 ..... inserting junctions into the genome indices Apr 15 12:29:14 ..... inserting junctions into the genome indices Apr 15 13:21:33 ..... started 1st pass mapping Apr 15 13:21:33 ..... started 1st pass mapping Apr 16 02:38:03 ..... finished 1st pass mapping Apr 16 02:38:09 ..... inserting junctions into the genome indices Apr 16 09:38:51 ..... started mapping Apr 16 12:52:00 ..... finished 1st pass mapping Apr 16 12:56:46 ..... inserting junctions into the genome indices Apr 16 16:05:16 ..... started mapping Apr 17 01:38:28 ..... finished mapping Apr 17 01:38:32 ..... finished successfully Apr 17 07:48:19 ..... finished mapping Apr 17 07:48:23 ..... finished successfully [W::bam_hdr_read] bgzf_check_EOF: Invalid argument [E::bam_hdr_read] Invalid BAM binary header [bam_cat] ERROR: couldn't read header for '/home/workbench/OUTPUT/zUMIs_output/.tmpMap//tmp.MouseSmartSeq2.2.Aligned.out.bam'. [W::bam_hdr_read] bgzf_check_EOF: Invalid argument [E::bam_hdr_read] Invalid BAM binary header [bam_cat] ERROR: couldn't read header for '/home/workbench/OUTPUT/zUMIs_output/.tmpMap//tmp.MouseSmartSeq2.2.Aligned.toTranscriptome.out.bam'. Sun Apr 17 07:52:39 JST 2022 Counting... [1] "2022-04-17 07:53:02 JST" $project [1] "MouseSmartSeq2"
$sequence_files $sequence_files$file1 $sequence_files$file1$name [1] "/home/workbench/fastq/SmartSeq2_S1_L001_R1_001.fastq.gz"
$sequence_files$file1$base_definition [1] "cDNA(1-76)"
$sequence_files$file2 $sequence_files$file2$name [1] "/home/workbench/fastq/SmartSeq2_S1_L001_R2_001.fastq.gz"
$sequence_files$file2$base_definition [1] "cDNA(1-76)"
$sequence_files$file3 $sequence_files$file3$name [1] "/home/workbench/fastq/SmartSeq2_S1_L001_I1_001.fastq.gz"
$sequence_files$file3$base_definition [1] "BC(1-8)"
$sequence_files$file4 $sequence_files$file4$name [1] "/home/workbench/fastq/SmartSeq2_S1_L001_I2_001.fastq.gz"
$sequence_files$file4$base_definition [1] "BC(1-8)"
$reference $reference$STAR_index [1] "/home/workbench/Reference/Mouse"
$reference$GTF_file [1] "/home/workbench/Reference/Mus_musculus.GRCm39.105.gtf"
$reference$exon_extension [1] FALSE
$reference$extension_length [1] 0
$reference$scaffold_length_min [1] 0
$out_dir [1] "/home/workbench/OUTPUT"
$num_threads [1] 10
$mem_limit [1] 100
$filter_cutoffs $filter_cutoffs$BC_filter $filter_cutoffs$BC_filter$num_bases [1] 1
$filter_cutoffs$BC_filter$phred [1] 20
$filter_cutoffs$UMI_filter $filter_cutoffs$UMI_filter$num_bases [1] 1
$filter_cutoffs$UMI_filter$phred [1] 20
$barcodes $barcodes$barcode_num NULL
$barcodes$automatic [1] FALSE
$barcodes$BarcodeBinning [1] 0
$barcodes$nReadsperCell [1] 1
$barcodes$demultiplex [1] FALSE
$counting_opts $counting_opts$introns [1] TRUE
$counting_opts$downsampling [1] "0"
$counting_opts$strand [1] 0
$counting_opts$Ham_Dist [1] 0
$counting_opts$velocyto [1] FALSE
$counting_opts$primaryHit [1] TRUE
$counting_opts$twoPass [1] TRUE
$counting_opts$write_ham [1] FALSE
$counting_opts$multi_overlap [1] FALSE
$counting_opts$intronProb [1] FALSE
$make_stats [1] TRUE
$which_Stage [1] "Filtering"
$read_layout [1] "PE"
$zUMIs_directory [1] "/home/zUMIs"
$samtools_exec [1] "samtools"
$pigz_exec [1] "pigz"
$STAR_exec [1] "STAR"
$Rscript_exec [1] "Rscript"
[1] "4.5e+08 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "/home/workbench/OUTPUT/MouseSmartSeq2.final_annot.gtf"
[1] "Annotation loaded!"
Warning message:
as_quosure()
requires an explicit environment as of rlang 0.3.0.
Please supply env
.
This warning is displayed once per session.
[1] "Assigning reads to features (ex)"
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 1.32.4
//========================== featureCounts setting ===========================\ | Input files : 1 BAM file | P MouseSmartSeq2.filtered.tagged.Aligned.out ... | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Annotation : R data.frame | ||||||||||
Assignment details : |
||||||||||
(Note that files are saved to the output directory) | ||||||||||
Dir for temp files : . | ||||||||||
Threads : 10 | ||||||||||
Level : meta-feature level | ||||||||||
Paired-end : yes | ||||||||||
Multimapping reads : counted | ||||||||||
Multiple alignments : primary alignment only | ||||||||||
Multi-overlapping reads : not counted | ||||||||||
Min overlapping bases : 1 | ||||||||||
Chimeric reads : not counted | ||||||||||
Both ends mapped : not required | ||||||||||
\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\ | Load annotation file .Rsubread_UserProvidedAnnotation_pid2584 ... | Features : 291510 | Meta-features : 55414 | Chromosomes/contigs : 39 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Process BAM file MouseSmartSeq2.filtered.tagged.Aligned.out.bam... | ||||||||||||||||||
Paired-end reads are included. | ||||||||||||||||||
Assign alignments (paired-end) to features... | ||||||||||||||||||
Total alignments : 74607609 | ||||||||||||||||||
Successfully assigned alignments : 17715334 (23.7%) | ||||||||||||||||||
Running time : 3.44 minutes | ||||||||||||||||||
\===================== http://subread.sourceforge.net/ ======================//
[1] "Assigning reads to features (in)"
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 1.32.4
//========================== featureCounts setting ===========================\ | Input files : 1 BAM file | P MouseSmartSeq2.filtered.tagged.Aligned.out ... | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Annotation : R data.frame | ||||||||||
Assignment details : |
||||||||||
(Note that files are saved to the output directory) | ||||||||||
Dir for temp files : . | ||||||||||
Threads : 10 | ||||||||||
Level : meta-feature level | ||||||||||
Paired-end : yes | ||||||||||
Multimapping reads : counted | ||||||||||
Multiple alignments : primary alignment only | ||||||||||
Multi-overlapping reads : not counted | ||||||||||
Min overlapping bases : 1 | ||||||||||
Chimeric reads : not counted | ||||||||||
Both ends mapped : not required | ||||||||||
\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\ | Load annotation file .Rsubread_UserProvidedAnnotation_pid2584 ... | Features : 220154 | Meta-features : 28763 | Chromosomes/contigs : 32 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Process BAM file MouseSmartSeq2.filtered.tagged.Aligned.out.bam.ex.fea ... | ||||||||||||||||||
Paired-end reads are included. | ||||||||||||||||||
Assign alignments (paired-end) to features... | ||||||||||||||||||
Total alignments : 74607609 | ||||||||||||||||||
Successfully assigned alignments : 1688384 (2.3%) | ||||||||||||||||||
Running time : 3.37 minutes | ||||||||||||||||||
\===================== http://subread.sourceforge.net/ ======================//
[1] "2022-04-17 08:01:55 JST"
[1] "Coordinate sorting final bam file..."
samtools sort: couldn't allocate memory for bam_mem
[E::hts_open_format] Failed to open file /home/workbench/OUTPUT/MouseSmartSeq2.filtered.Aligned.GeneTagged.sorted.bam
samtools index: failed to open "/home/workbench/OUTPUT/MouseSmartSeq2.filtered.Aligned.GeneTagged.sorted.bam": No such file or directory
[1] "2022-04-17 08:01:57 JST"
[1] "Here are the detected subsampling options:"
[1] "Automatic downsampling"
[1] "Working on barcode chunk 1 out of 1"
[1] "Processing 3752 barcodes in this chunk..."
[1] "/home/workbench/OUTPUT/MouseSmartSeq2.filtered.Aligned.GeneTagged.sorted.bam"
Error in value[3L] :
failed to open BamFile: file(s) do not exist:
'/home/workbench/OUTPUT/MouseSmartSeq2.filtered.Aligned.GeneTagged.sorted.bam'
Calls: reads2genes_new ... tryCatch -> tryCatchList -> tryCatchOne ->
This still results in an empty "expression" folder.
When I looked up the following error message, [W::bam_hdr_read] bgzf_check_EOF: Invalid argument [E::bam_hdr_read] Invalid BAM binary header I did find a thread suggesting that this may be due to running out of memory (https://github.com/alexdobin/STAR/issues/997), but output did not indicate a segmentation fault.
Should I set a fixed value for "mem_limit" instead of the current "null"? Please let me know what I can try.
Thank you.
Kai Battenberg
Hi Christoph,
Great news! Apparently the issue was not with zUMIs but with how a Windows computer shares its memory with a Docker container. I repeated the same process on a CentOS computer and there was no issue whatsoever.
So the only problem I was having was with the headers!
Thank you very much for your help!
Kai Battenberg
Hi zUMIs folks,
I am trying to troubleshoot a situation that I have ran into and would like some help. I apologize for my long entry in advance.
Input conditions (here is what I have at hand)
zUMIs version:
zUMIs command to execute:
/home/zUMIs/zUMIs.sh -c -y /home/workbench/config_files/config_for_SmartSeq2.yaml
Input data:
Reference:
zUMIs configuration file:
Output (Here is what I get for standard output)
# /home/zUMIs/zUMIs.sh -c -y /home/workbench/config_files/config_for_SmartSeq2.yaml Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file!
You provided these parameters: YAML file: /home/workbench/config_files/config_for_SmartSeq2.yaml zUMIs directory: /home/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: null zUMIs version 2.9.7
Sat Apr 9 22:26:51 JST 2022 WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.4a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a. Filtering... Sun Apr 10 02:14:14 JST 2022 [1] "3752 barcodes detected." [1] "5699177 reads were assigned to barcodes that do not correspond to intact cells." Mapping... [1] "2022-04-10 02:14:35 JST" STAR --readFilesCommand samtools view -@ 1 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /home/workbench/Reference/Mouse --sjdbGTFfile /home/workbench/Reference/Mus_musculus.GRCm39.105.gtf --runThreadN 1 --readFilesType SAM PE --genomeSAindexNbases 11 --limitOutSJcollapsed 5000000 --twopassMode Basic --readFilesIn /home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ae.filtered.tagged.bam,/home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2af.filtered.tagged.bam --outFileNamePrefix /home/workbench/OUTPUT/zUMIs_output/.tmpMap//tmp.MouseSmartSeq2.3. STAR version: 2.7.9a compiled: 2021-07-01T11:54:56+09:00 a524ed1d99de:/home/STAR-2.7.9a/source Apr 10 02:14:40 ..... started STAR run Apr 10 02:14:41 ..... loading genome STAR --readFilesCommand samtools view -@ 1 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /home/workbench/Reference/Mouse --sjdbGTFfile /home/workbench/Reference/Mus_musculus.GRCm39.105.gtf --runThreadN 1 --readFilesType SAM PE --genomeSAindexNbases 11 --limitOutSJcollapsed 5000000 --twopassMode Basic --readFilesIn /home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2aa.filtered.tagged.bam,/home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ab.filtered.tagged.bam --outFileNamePrefix /home/workbench/OUTPUT/zUMIs_output/.tmpMap//tmp.MouseSmartSeq2.1. STAR version: 2.7.9a compiled: 2021-07-01T11:54:56+09:00 a524ed1d99de:/home/STAR-2.7.9a/source Apr 10 02:14:40 ..... started STAR run Apr 10 02:14:41 ..... loading genome STAR --readFilesCommand samtools view -@ 1 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /home/workbench/Reference/Mouse --sjdbGTFfile /home/workbench/Reference/Mus_musculus.GRCm39.105.gtf --runThreadN 1 --readFilesType SAM PE --genomeSAindexNbases 11 --limitOutSJcollapsed 5000000 --twopassMode Basic --readFilesIn /home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ac.filtered.tagged.bam,/home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ad.filtered.tagged.bam --outFileNamePrefix /home/workbench/OUTPUT/zUMIs_output/.tmpMap//tmp.MouseSmartSeq2.2. STAR version: 2.7.9a compiled: 2021-07-01T11:54:56+09:00 a524ed1d99de:/home/STAR-2.7.9a/source Apr 10 02:14:40 ..... started STAR run Apr 10 02:14:41 ..... loading genome STAR --readFilesCommand samtools view -@ 1 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /home/workbench/Reference/Mouse --sjdbGTFfile /home/workbench/Reference/Mus_musculus.GRCm39.105.gtf --runThreadN 1 --readFilesType SAM PE --genomeSAindexNbases 11 --limitOutSJcollapsed 5000000 --twopassMode Basic --readFilesIn /home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ag.filtered.tagged.bam,/home/workbench/OUTPUT/zUMIs_output/.tmpMerge//MouseSmartSeq2.MouseSmartSeq2ah.filtered.tagged.bam --outFileNamePrefix /home/workbench/OUTPUT/zUMIs_output/.tmpMap//tmp.MouseSmartSeq2.4. STAR version: 2.7.9a compiled: 2021-07-01T11:54:56+09:00 a524ed1d99de:/home/STAR-2.7.9a/source Apr 10 02:14:40 ..... started STAR run Apr 10 02:14:42 ..... loading genome Apr 10 02:25:21 ..... processing annotations GTF Apr 10 02:28:09 ..... inserting junctions into the genome indices Apr 10 02:39:54 ..... started 1st pass mapping
ReadAlignChunk_processChunks.cpp:55:processChunks EXITING because of FATAL ERROR in input BAM file: the consecutive lines in paired-end BAM have different read IDs: SRR9621775.283074845 vs
SOLUTION: fix BAM file formatting. Paired-end reads should be always consecutive lines, with exactly 2 lines per paired-end read Apr 10 02:39:55 ...... FATAL ERROR, exiting [main_cat] ERROR: input is not BAM or CRAM [main_cat] ERROR: input is not BAM or CRAM Sun Apr 10 02:43:50 JST 2022 Counting... [1] "2022-04-10 02:44:01 JST" $project [1] "MouseSmartSeq2"
$sequence_files $sequence_files$file1 $sequence_files$file1$name [1] "/home/workbench/Raw_data/SmartSeq2_S1_L001_R1_001.fastq.gz"
$sequence_files$file1$base_definition [1] "cDNA(1-76)"
$sequence_files$file2 $sequence_files$file2$name [1] "/home/workbench/Raw_data/SmartSeq2_S1_L001_R2_001.fastq.gz"
$sequence_files$file2$base_definition [1] "cDNA(1-76)"
$sequence_files$file3 $sequence_files$file3$name [1] "/home/workbench/Raw_data/SmartSeq2_S1_L001_I1_001.fastq.gz"
$sequence_files$file3$base_definition [1] "BC(1-8)"
$sequence_files$file4 $sequence_files$file4$name [1] "/home/workbench/Raw_data/SmartSeq2_S1_L001_I2_001.fastq.gz"
$sequence_files$file4$base_definition [1] "BC(1-8)"
$reference $reference$STAR_index [1] "/home/workbench/Reference/Mouse"
$reference$GTF_file [1] "/home/workbench/Reference/Mus_musculus.GRCm39.105.gtf"
$reference$exon_extension [1] FALSE
$reference$extension_length [1] 0
$reference$scaffold_length_min [1] 0
$out_dir [1] "/home/workbench/OUTPUT"
$num_threads [1] 7
$mem_limit [1] 100
$filter_cutoffs $filter_cutoffs$BC_filter $filter_cutoffs$BC_filter$num_bases [1] 1
$filter_cutoffs$BC_filter$phred [1] 20
$filter_cutoffs$UMI_filter $filter_cutoffs$UMI_filter$num_bases [1] 1
$filter_cutoffs$UMI_filter$phred [1] 20
$barcodes $barcodes$barcode_num NULL
$barcodes$automatic [1] FALSE
$barcodes$BarcodeBinning [1] 0
$barcodes$nReadsperCell [1] 1
$barcodes$demultiplex [1] FALSE
$counting_opts $counting_opts$introns [1] TRUE
$counting_opts$downsampling [1] "0"
$counting_opts$strand [1] 0
$counting_opts$Ham_Dist [1] 0
$counting_opts$velocyto [1] FALSE
$counting_opts$primaryHit [1] TRUE
$counting_opts$twoPass [1] TRUE
$counting_opts$write_ham [1] FALSE
$counting_opts$multi_overlap [1] FALSE
$counting_opts$intronProb [1] FALSE
$make_stats [1] TRUE
$which_Stage [1] "Filtering"
$read_layout [1] "PE"
$zUMIs_directory [1] "/home/zUMIs"
$samtools_exec [1] "samtools"
$pigz_exec [1] "pigz"
$STAR_exec [1] "STAR"
$Rscript_exec [1] "Rscript"
[1] "4.5e+08 Reads per chunk" [1] "Loading reference annotation from:" [1] "/home/workbench/OUTPUT/MouseSmartSeq2.final_annot.gtf" [E::hts_open_format] Failed to open file /home/workbench/OUTPUT/MouseSmartSeq2.filtered.tagged.Aligned.out.bam samtools view: failed to open "/home/workbench/OUTPUT/MouseSmartSeq2.filtered.tagged.Aligned.out.bam" for reading: No such file or directory [E::hts_open_format] Failed to open file /home/workbench/OUTPUT/MouseSmartSeq2.filtered.tagged.Aligned.out.bam samtools view: failed to open "/home/workbench/OUTPUT/MouseSmartSeq2.filtered.tagged.Aligned.out.bam" for reading: No such file or directory Error in gsub("SN:", "", chr) : object 'chr' not found Calls: .makeSAF ... .chromLengthFilter -> [ -> [.data.table -> eval -> eval -> gsub In addition: Warning message: In data.table::fread(bread, col.names = c("chr", "len"), header = F) : File '/tmp/RtmpfjPrwR/filed177cb9fbbe' has size 0. Returning a NULL data.table. Execution halted Sun Apr 10 02:44:17 JST 2022 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Error in gzfile(file, "rb") : cannot open the connection Calls: rds_to_loom -> readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/workbench/OUTPUT/zUMIs_output/expression/MouseSmartSeq2.dgecounts.rds', probable reason 'No such file or directory' Execution halted Sun Apr 10 02:44:19 JST 2022 Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2022-04-10 02:44:19 JST" Error in gzfile(file, "rb") : cannot open the connection Calls: readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/home/workbench/OUTPUT/zUMIs_output/expression/MouseSmartSeq2.dgecounts.rds', probable reason 'No such file or directory' Execution halted Sun Apr 10 02:44:23 JST 2022
Bug description
Apparently, there is an issue that is related to a specific read (SRR9621775.283074845) not being paired and this results in the "expression" folder in the output to be entirely empty. I have tried this thrice and this is reproducible. Obviously, I checked with the input FASTQ files to see if there is something the matter with them, but this doe not appears to be the case:
# zcat SmartSeq2_S1_L001_R1_001.fastq.gz | wc -l 1981700232 (The same for the other 3 files)
# zcat SmartSeq2_S1_L001_R1_001.fastq.gz | head -n 1132299380 | tail -n 4 @SRR9621775.283074845 D00224L:270:CCU0CANXX:7:1305:13112:64065 length=76 TGCTAAGATTTTGCGTAGCTGGGTTTGGTTTAATCCACCTCAACTGCCTGCTATGATGGATAAGATTGAGAGAGTG +SRR9621775.283074845 D00224L:270:CCU0CANXX:7:1305:13112:64065 length=76 0<BBBBF0FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFF (Matching for the other 3 files)
# zcat SmartSeq2_S1_L001_R1_001.fastq.gz | tail -n 4 @SRR9621775.495425058 D00224L:270:CCU0CANXX:8:2316:21278:101425 length=76 GTGGTATCAACGCAGAGTACGGGAAGCAGTGGTATCAACGCAGAGTACGGGAAGCAGTGGTATCAACGCAGAGTAC +SRR9621775.495425058 D00224L:270:CCU0CANXX:8:2316:21278:101425 length=76 <<BB<<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FBFF0BFFBFBFFFFFB0 (Matching for the other 3 files)
I was not sure if the issue was with zUMIs or with the specific file in SRA, but I could not identify any obvious problems. Any suggestion would be very much appreciated.
All the best,
Kai Battenberg