Closed madzafv closed 6 years ago
sorry for the late reply.
htseq-count
?I have a similar issue. I have 75bp PE reads. I use STAR and my command for the fastq.gz files reads:
STAR --runThreadN 10 --genomeDir $genome_dir --readFilesCommand zcat --readFilesIn /projects/proj_mouse/raw_data/LM_312/LM_312_R1.fastq.gz /projects/proj_mouse/raw_data/LM_312/LM_312_R2.fastq.gz --outSAMmapqUnique 60 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix /projects/proj_mouse/starout_lm/LM312. ####################################################
followed by htseq-count:
htseq-count -f bam -q -m intersection-nonempty -s reverse -t exon -i gene_id /projects/b1042/proj_mouse/starout_lm/LM312.Aligned.sortedByCoord.out.bam $GTF > /projects/b1042/ClareLab/Manish/proj_mouse/counts/hts_lm/LM312.htseq.counts
once I submit the job, the script log file gets populated with this report:
PBS: End PBS Prologue Sun Jan 28 11:48:20 CST 2018 1517161700 Warning: Read K00379:60:HM3VGBBXX:2:1124:30959:29712 claims to have an aligned mate which could not be found in an adjacent line. Warning: 62161780 reads with missing mate encountered. Warning: Read K00379:60:HM3VGBBXX:1:2124:3518:14203 claims to have an aligned mate which could not be found in an adjacent line.
However, it continues without aborting. What am I doing wrong here? Thanks!
I have a similar issue htseq-count -s no -f bam s-3010T_H3TN7DMXX_L1_sorted.bam Homo_sapiens.GRCh38.91.gtf > s-3010T_H3TN7DMXX_L1_htseq.count
By default, htseq-count expects paired-end sam files to be sorted by read name, because then, reads from the same fragment (and hence with the same read name) appear in adjacent lines.
discussion of this issue moved in #37
Example:
The program continues to run even after spiting out these warnings. Does it skip the troubled reads and continues counting? I'm also confused about what these other warnings mean:
thanks