Open aliibarry opened 1 year ago
Hi,
That is indeed odd. Can you share the exact yaml file you use? Do you get an unmapped.bam file in your outputs, if yes how does it look? (eg. first few lines of samtools view)
Regarding the warning on the STAR version should be OK - STAR doesn't always write the precise version number into its index files.
Best, Christoph
Hiya,
YAML is:
project: trial
sequence_files:
file1:
name: /home/amb/patchseq/Undetermined_S0_R1_001.fastq.gz
base_definition:
- cDNA(23-50)
- UMI(12-19)
find_pattern: ATTGCGCAATG
file2:
name: /home/amb/patchseq/Undetermined_S0_R2_001.fastq.gz
base_definition:
- cDNA(1-50)
file3:
name: /home/amb/patchseq/Undetermined_S0_I1_001.fastq.gz
base_definition:
- BC(1-8)
file4:
name: /home/amb/patchseq/Undetermined_S0_I2_001.fastq.gz
base_definition:
- BC(1-8)
reference:
STAR_index: /home/amb/hg_genome_STAR2.7.3a #made without overhang info
#pigz_exec: /home/amb/miniconda3/bin/pigz
#STAR_exec: /home/amb/STAR-2.7.11a/source/STAR
#samtools_exec: /home/amb/samtools-1.18/samtools
Rscript_exec: /usr/bin/R
GTF_file: /home/amb/gencode.v44.primary_assembly.annotation.gtf
additional_STAR_params: '--limitSjdbInsertNsj 2000000 --clip3pAdapterSeq CTGTCTCTTATACACATCT'
additional_files:
out_dir: /home/amb/patchseq/out
num_threads: 1
mem_limit: 31
filter_cutoffs:
BC_filter:
num_bases: 3
phred: 20
UMI_filter:
num_bases: 3
phred: 20
barcodes:
barcode_num: ~
barcode_file:
automatic: no
BarcodeBinning: 1
nReadsperCell: 100
demultiplex: yes
counting_opts:
introns: yes
downsampling: '0'
strand: 0
Ham_Dist: 1
write_ham: yes
velocyto: no
primaryHit: yes
twoPass: no
make_stats: yes
which_Stage: Filtering
There is an unmapped.bam, but is seems incomplete? For out_dir/trial.filtered.tagged.unmapped.bam
, this is the head:
VH01324:51:AAF5FKVM5:1:1101:18231:1000 77 * 0 0 * * 0 0 GCTTTGTATAAACCAGTGATTTTACTACAAAAAACACTGTCCTTGAAAGA CCCCCCCCCCC;;CCC;C;CCCCCCCCCCCCCCCCCCCCCCCC;CCCCCC BX:Z:ATCTCAGGTACTCCTT BC:Z:ATCTCAGGTACTCCTT UB:Z: QB:Z:CC;CC;CCCCCCCCCC QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18231:1000 141 * 0 0 * * 0 0 CTTCTTAAGTGGAATATTCTAATAAGCTACCTTTTGTAAGTGCCATGTTT CCCCCCCCCCCC-CC-CCCCCCC;CCCCCCCCCC-CCCCCCC-CCCCCCC BX:Z:ATCTCAGGTACTCCTT BC:Z:ATCTCAGGTACTCCTT UB:Z: QB:Z:CC;CC;CCCCCCCCCC QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18307:1000 77 * 0 0 * * 0 0 CCCAGAGAGTGGGTCAGCTGGAAGCCCTGGAGACAGTCACAGCTCTCTGA CCC-C;C-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC; BX:Z:CGAGGCTGCGGAGAGA BC:Z:CGAGGCTGCGGAGAGA UB:Z: QB:Z:CC-CCCCCCCC-CC;C QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18307:1000 141 * 0 0 * * 0 0 GCCTGGCACCATGGACTCTGTCAGGTCTGGACCCTTCGGCCAGATCTTCA ;CCCCCC;CCCCCCCC;CCCCC;CCCCCCCCCC;CCCCCCCCC;-C;;CC BX:Z:CGAGGCTGCGGAGAGA BC:Z:CGAGGCTGCGGAGAGA UB:Z: QB:Z:CC-CCCCCCCC-CC;C QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18345:1000 77 * 0 0 * * 0 0 TCCCTGGAGCGGCAGCTCAGCGACATCGAGGAGCGCCACAACCACGACCT CCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC BX:Z:CGTCCTAGCTCCTTAC BC:Z:CGTACTAGCTCCTTAC UB:Z: QB:Z:CCC-CC;CCCCCCCCC QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18345:1000 141 * 0 0 * * 0 0 GTATACAGTGGCCCAGTGATGCTTCCTGCAAATGTGCTAAATCTAGTCTC ;CCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC BX:Z:CGTCCTAGCTCCTTAC BC:Z:CGTACTAGCTCCTTAC UB:Z: QB:Z:CCC-CC;CCCCCCCCC QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18383:1000 77 * 0 0 * * 0 0 AAAGAAGATATTGCAATGTGGGAAGTAAATGAAGCCTTTAGTCTGGTTGT CC;CCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC BX:Z:CTCTCTACAGGCTTAG BC:Z:CTCTCTACAGGCTTAG UB:Z: QB:Z:CCCCCCCCCCCCC-;C QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18383:1000 141 * 0 0 * * 0 0 GCATGAGTCAAATGACCAACAATCCTGGCTCCAGACATCCCAATTGGATG C-CCC-CCCC;CCCCCCC-CCCCCCCCCCCCCCC;C;CCCCCCCCCCCCC BX:Z:CTCTCTACAGGCTTAG BC:Z:CTCTCTACAGGCTTAG UB:Z: QB:Z:CCCCCCCCCCCCC-;C QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18459:1000 77 * 0 0 * * 0 0 GATATAGTTTGAGTATTTGTCCTCTTCAAATCTCATGTTGAAATGTTATC CCC;CCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC BX:Z:GCTCATGAATTAGACG BC:Z:GCTCATGAATTAGACG UB:Z: QB:Z:CCCCCCCCCCCCCCCC QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18459:1000 141 * 0 0 * * 0 0 TTTTAAAACCAGCTCTCACATGAGCTAATGGAATAAGAACTCACTCATTA CCCCCC;CCCCCCCCCCCCCCC;CCCCCCCCC-C;CC-CCCCCCCCCCCC BX:Z:GCTCATGAATTAGACG BC:Z:GCTCATGAATTAGACG UB:Z: QB:Z:CCCCCCCCCCCCCCCC QU:Z:
Hey,
OK that looks actually quite good for the unmapped bam, and it did clearly set the PE flags correctly to the reads which is what STAR complained about.
Anyways, my gut feeling is the commented out lines in the "reference" section may disturb things in the yaml! Please remove them completely and have a check
#pigz_exec: /home/amb/miniconda3/bin/pigz
#STAR_exec: /home/amb/STAR-2.7.11a/source/STAR
#samtools_exec: /home/amb/samtools-1.18/samtools
Thanks for the quick reply. I removed all comments from the yaml, but am getting the same issues. Unmapped bam output is still generated, fails during the mapping stage.
I did a fully fresh run as well, but this is the error when starting with Mapping
with bash zUMIs/zUMIs.sh -c -y patchseq/patchseq.yaml
Warning: YAML file doesn't include 'pigz_exec' option; setting to 'pigz'
Warning: YAML file doesn't include 'STAR_exec' option; setting to 'STAR'
Using miniconda environment for zUMIs!
note: internal executables will be used instead of those specified in the YAML file!
You provided these parameters:
YAML file: patchseq/patchseq.yaml
zUMIs directory: /home/amb/zUMIs
STAR executable STAR
samtools executable samtools
pigz executable pigz
Rscript executable Rscript
RAM limit: 31
zUMIs version 2.9.7e
Tue Oct 31 02:20:58 PM CET 2023
WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a.
Mapping...
[1] "2023-10-31 14:20:58 CET"
EXITING because of FATAL INPUT ERROR: --readFilesType SAM requires specifying SE or PE reads
SOLUTION: specify --readFilesType SAM SE for single-end reads or --readFilesType SAM PE for paired-end reads
Oct 31 14:20:59 ...... FATAL ERROR, exiting
Tue Oct 31 02:20:59 PM CET 2023
Counting...
[1] "2023-10-31 14:21:02 CET"
[1] "46500000 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "/home/amb/patchseq/out2/trial.final_annot.gtf"
Error in gsub("SN:", "", chr) : object 'chr' not found
Calls: .makeSAF ... .chromLengthFilter -> [ -> [.data.table -> eval -> eval -> gsub
In addition: Warning message:
In data.table::fread(bread, col.names = c("chr", "len"), header = F) :
File '/tmp/RtmpdYJWcf/file69191ccf16f2' has size 0. Returning a NULL data.table.
Execution halted
Possibly relevant: during one trial one point I saw an error with Fastq files are not in the same order
but I haven't managed to replicate the error - I think it was because I was overwriting the output directory?
Just updating for anyone else seeing the same issues - I never resolved this and instead switched to a kallisto-bustools pipeline, which now has a smart-seq3 option. See biostars post.
Another option that worked for me was umi_tools > samtools > umi_tools dedup > feature counts.
Trying to analyse some SMART-SEQ3 data and can't manage to get past the mapping step. Any suggestions would be much appreciated. I've remade my index multiple times (
STAR --version
is giving 2.7.3a, even though it's being flagged below as 2.7.1a?), and have also tried with using my own dependencies and STAR 2.7.11a, as well as a fresh zUMI pull (working with 2.9.7e).bash zUMIs/zUMIs.sh -c -y patch-seq/patchseq.yaml
Currently using the yaml provided from smart-seq3 example (https://github.com/sandberg-lab/Smart-seq3/blob/master/allele_level_expression/mouse_cross.yaml) with
num_threads:
andmem_limit:
adjusted, as well as nobarcode_file:
Output is as follows
I've tried re-running this from the mapping step using
which_Stage: Mapping
in the YAML and get a slightly different error with an eventualExecution halted
.As an aside: I'm trying to get this working on an HPC in parallel, but am still working through permission issues with the support team, any tips there would also be appreciated, error below.