So I've noticed that when using bam files as input to pairtools parse, it matters if it is straight from the aligner, coordinate sorted, or read name sorted. This is my workflow -
1) Is this expected behavior? If so, perhaps there should be a warning.
2) I believe normally the output of aligners is read grouped, does this mean I can expect the same results from the regular bam output and the read sorted bam?
So I've noticed that when using bam files as input to pairtools parse, it matters if it is straight from the aligner, coordinate sorted, or read name sorted. This is my workflow -
If I use the coordinate sorted bam, i.e. the output of
samtools sort
, then I get the following matrix -[[1 0 0 ... 0 0 0] [0 1 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 1 0] [0 0 0 ... 0 0 3]]
If I use the read name sorted bam, i.e. the output of
samtools sort -n
, then I get the following matrix -[[233 233 6 ... 2 0 0] [233 423 241 ... 1 0 1] [ 6 241 392 ... 2 2 1] ... [ 2 1 2 ... 642 282 19] [ 0 0 2 ... 282 448 217] [ 0 1 1 ... 19 217 406]]
I have the following questions -
1) Is this expected behavior? If so, perhaps there should be a warning.
2) I believe normally the output of aligners is read grouped, does this mean I can expect the same results from the regular bam output and the read sorted bam?