simon-anders / htseq

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
https://htseq.readthedocs.io/en/release_0.11.1/
GNU General Public License v3.0
122 stars 77 forks source link

htseq-count Read claims to have an aligned mate which could not be found in an adjacent line. #42

Closed madzafv closed 6 years ago

madzafv commented 7 years ago

Example:

Read SN860:669:C8F8HACXX:7:1101:2485:2172.firstrun.1 claims to have an aligned mate which could not be found in an adjacent line.

The program continues to run even after spiting out these warnings. Does it skip the troubled reads and continues counting? I'm also confused about what these other warnings mean:

Warning: 23877510 reads with missing mate encountered. 30111080 SAM alignment pairs processed.

thanks

iosonofabio commented 6 years ago

sorry for the late reply.

mranjan1 commented 6 years ago

I have a similar issue. I have 75bp PE reads. I use STAR and my command for the fastq.gz files reads:

STAR --runThreadN 10 --genomeDir $genome_dir --readFilesCommand zcat --readFilesIn /projects/proj_mouse/raw_data/LM_312/LM_312_R1.fastq.gz /projects/proj_mouse/raw_data/LM_312/LM_312_R2.fastq.gz --outSAMmapqUnique 60 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix /projects/proj_mouse/starout_lm/LM312. ####################################################

followed by htseq-count:

htseq-count -f bam -q -m intersection-nonempty -s reverse -t exon -i gene_id /projects/b1042/proj_mouse/starout_lm/LM312.Aligned.sortedByCoord.out.bam $GTF > /projects/b1042/ClareLab/Manish/proj_mouse/counts/hts_lm/LM312.htseq.counts

once I submit the job, the script log file gets populated with this report:

PBS: End PBS Prologue Sun Jan 28 11:48:20 CST 2018 1517161700 Warning: Read K00379:60:HM3VGBBXX:2:1124:30959:29712 claims to have an aligned mate which could not be found in an adjacent line. Warning: 62161780 reads with missing mate encountered. Warning: Read K00379:60:HM3VGBBXX:1:2124:3518:14203 claims to have an aligned mate which could not be found in an adjacent line.

However, it continues without aborting. What am I doing wrong here? Thanks!

Lee211 commented 6 years ago

I have a similar issue htseq-count -s no -f bam s-3010T_H3TN7DMXX_L1_sorted.bam Homo_sapiens.GRCh38.91.gtf > s-3010T_H3TN7DMXX_L1_htseq.count

simon-anders commented 6 years ago

By default, htseq-count expects paired-end sam files to be sorted by read name, because then, reads from the same fragment (and hence with the same read name) appear in adjacent lines.

iosonofabio commented 6 years ago

discussion of this issue moved in #37