Closed gringer closed 12 years ago
Thanks for reporting this. In 79c6c969ffb I have added a dependency check for these two abyss programs. I am going to leave abyss-fixmate in the pipeline since I depend on DistanceEst already. Setting the template length is critical for DistanceEst.
It might be worth noting somewhere that sga (or more specifically sga-bam2de.pl) depends on two programs from ABySS, abyss-fixmate and DistanceEst. On my debian system I was able to install abyss, but needed to modify this file to point at '/usr/lib/abyss/abyss-fixmate' and '/usr/lib/abyss/DistanceEst' for the functions to work (these programs are not in the default path). No errors were produced to indicate that these programs were missing, so it took a while for me to work out why my scaffolding wasn't joining any contigs.
FWIW, it might be possible to remove the dependance on abyss-fixmate without too much additional work. Generating the histogram of average distances (e.g. pe.hist) can be done fairly fast with a combination of samtools and awk, using the 'view' command to filter on the first read of a pair when the pairs are properly mapped (bowtie2 seems to define this as correct orientation with not too much distance between reads):
samtools view -f 0x42 mappedreads.bam | awk '{print sqrt($9*$9)}' | sort -n | uniq -c | sort -k 2,2n | awk '{print $2"\t"$1}' > pe.hist
Creating the contig distance file (e.g. pe.de) will probably require things beyond simple command line pipes. I can generate a sorted BAM file containing only pairs with different contigs:
(samtools view -H mappedreads.bam; samtools view -F 0x02 mappedreads.bam | cut -f 1-11 | awk -v 'OFS=\t' '{if($7 != "="){$1="";$10="";$11="*";print $0}}') | samtools view -Sb - | samtools sort - pe.diffcontigs.sorted
However, altering the template length (field 9) would (I expect) need a knowledge of the most likely read-pair distance.