open2c / pairtools

Extract 3D contacts (.pairs) from sequencing alignments
MIT License
104 stars 32 forks source link

parse - 90% of corrupt pair type #180

Closed nservant closed 1 year ago

nservant commented 1 year ago

Hi, I'm testing pairtools with a toy dataset.
Around 85% of reads are aligned with

bwa mem -5SP -T0 -t 2 $INDEX SRR4292758_00_R1.fastq.gz SRR4292758_00_R2.fastq.gz

When running pairtools sparse ;

pairtools parse \
    -c W303_SGD_2015_JRIU00000000.fsa.sizes \
    --nproc-in 2 --nproc-out 2 \
    --add-columns mapq --drop-sam --drop-seq --walks-policy 5unique --output-stats SRR4292758_parse.stats \
    --assembly W303_SGD_2015_JRIU00000000.fsa \
    --output-stats SRR4292758.pairsam.stat \
    -o SRR4292758.pairsam.gz \
    SRR4292758.bam

I have almost no "aligned reads ..." which are all 'XX' = 'corrupt'

>>head SRR4292758.pairsam.stat
total   928419
total_unmapped  900601
total_single_sided_mapped   25595
total_mapped    2223
total_dups  0
total_nodups    2223
cis 2223
trans   0
pair_types/XX   838730

I'm using a Saccharomyces cerevisiae genome, with non-standard chromosome names ... (like gi|696449480|gb|JRIU01000016.1|, etc), but I'm not sure that's the reason ...

I'm attaching here the files

Many thanks !

Phlya commented 1 year ago

Any chance you sorted the bam file before parsing? That came up recently...

nservant commented 1 year ago

Indeed ! I though I already checked this point but I guess I missed something. That's fixed ! thanks