zhou-lab / biscuit

BISulfite-seq CUI Toolkit
Other
62 stars 24 forks source link

one read has different CIGAR and read length #42

Open alexyfyf opened 3 years ago

alexyfyf commented 3 years ago

Hi team, I'm following your document from https://huishenlab.github.io/biscuit/ to analyse RRBS data.

The command I'm using is

biscuit align -t 12 -M -R "@RG\tID:1\tSM:'$BASE'" $REF $FILE | \
        samblaster -M | \
        samtools sort -o ${BASE}_mdups_sorted.bam -O BAM -

which is adapted from your docs. However, samblaster threw out error regarding sorting

samblaster: Loaded 66 header sequence entries.
samblaster: Can't find first and/or second of pair in sam block of length 1 for id: PC140529:356:C3EHVACXX:7:1101:1272:63028
samblaster:    At location: *:0
samblaster:    Are you sure the input is sorted by read ids?samblaster: Exiting early, the following stats are for processing preceeding the error
samblaster: Marked           8 of        378 (2.116%) total read ids as duplicates using 1556k memory in 0.001S CPU seconds and 2M4S(124S) wall time.
samblaster: Premature exit (return code 1).

I run the pipe step by step and found that the biscuit alignment output sam file has one line of mismatched CIGAR and read length.

The problematic reads is

@PC140529:356:C3EHVACXX:7:1312:19812:54284 1:N:0:ACTTGA
TGGGTGGAAGTGGGGGGGTGGGTTTAGATTGTTAGTGAGAGGAAGAGGTTT
+
DDCDDDDBDDDDDDCCDDDDDEDDDDBB:DB:0DDJJJHFFHFDFDDFBBB

I extracted the read and mapped it separately in biscuit align generated a correct alignment. But somehow, when it is in the fastq file, the alignment went wrong. Could you please provide some help to fix it?

Thank you!