Closed MartaBenegas closed 1 year ago
Hi,
Within zUMIs, you can choose to either count only uniquely aligned reads or include primary hits or multimappers if they fall within gene boundaries (set primary_hit: yes
in YAML)
Best, Christoph
Is it possible to get the STAR log file? How many reads are mapping/unmapped? Why were reads unmapped? How many multi-mapped?
Started job on | Apr 23 23:17:02
Started mapping on | Apr 23 23:17:04
Finished on | Apr 23 23:26:52
Mapping speed, Million of reads per hour | 115.68
Number of input reads | 18894432
Average input read length | 298
UNIQUE READS:
Uniquely mapped reads number | 17704240
Uniquely mapped reads % | 93.70%
Average mapped length | 297.39
Number of splices: Total | 3119841
Number of splices: Annotated (sjdb) | 2663436
Number of splices: GT/AG | 3080422
Number of splices: GC/AG | 14219
Number of splices: AT/AC | 248
Number of splices: Non-canonical | 24952
Mismatch rate per base, % | 0.49%
Deletion rate per base | 0.02%
Deletion average length | 2.70
Insertion rate per base | 0.02%
Insertion average length | 2.30
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 806405
% of reads mapped to multiple loci | 4.27%
Number of reads mapped to too many loci | 1146
% of reads mapped to too many loci | 0.01%
Hi,
The STAR final log will always be part of your zUMIs output, in the same folder as the bam files.
In the zUMIs_output /stats
folder zUMIs also produces more detailed mapping statistics.
On that matter, I've run zUMIs with one pair of fastq files:
project: dog01
sequence_files:
file1:
name: /data/input/G556-LBA-Dog01_S1_R1_001.fastq.gz
base_definition:
- BC(1-16)
- UMI(17-26)
file2:
name: /data/input/G556-LBA-Dog01_S1_R2_001.fastq.gz
base_definition: cDNA(1-59)
reference:
STAR_index: /data/input/starIdx
GTF_file: /data/input/Canis_lupus_familiaris.ROS_Cfam_1.0.109.chr.gtf
additional_STAR_params: ''
additional_files: ~
out_dir: /data/output
num_threads: 32
mem_limit: 0
filter_cutoffs:
BC_filter:
num_bases: 2
phred: 20
UMI_filter:
num_bases: 1
phred: 20
barcodes:
barcode_num: ~
barcode_file: /data/input/737K-august-2016.txt
automatic: no
BarcodeBinning: 1
nReadsperCell: 100
counting_opts:
introns: yes
downsampling: '0'
strand: 1
Ham_Dist: 0
velocyto: no
primaryHit: no
twoPass: yes
make_stats: yes
which_Stage: Filtering
Rscript_exec: Rscript
STAR_exec: /usr/bin/STAR-2.7.10b/source/STAR
pigz_exec: pigz
samtools_exec: samtools
But there are 4 STAR reports on the dog01.filtered.tagged.Log.final.out
file. Why is that so? Find logs attached.
dog01.filtered.tagged.Log.final.out.txt
dog01_unique.log.txt
Because you have not specified a memory limit, zUMIs was able to run STAR in 4 parallel instances to speed up the processing.
Hi zUMIs team!
May I ask how zUMIs handles multi-mapping reads? Does it keep only unique reads for counting or distribute them somehow among the genes?
You can have a look at the section "Multi-Gene reads" to know what I'm referring to: https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md
Thanks in advance!