pmelsted / pizzly

Fast fusion detection using kallisto
BSD 2-Clause "Simplified" License
80 stars 10 forks source link

pizzly reports more fusions than expected #17

Closed roryk closed 7 years ago

roryk commented 7 years ago

Heya, with:

pizzly version: 0.37.3
SeqAn version: 2.2.0
kallisto 0.43.1
wget ftp://ftp.ensembl.org/pub/release-89/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
wget ftp://ftp.ensembl.org/pub/release-89/gtf/homo_sapiens/Homo_sapiens.GRCh38.89.gtf.gz
kallisto index -i Homo_sapiens.GRCh38.idx -k 31 Homo_sapiens.GRCh38.cdna.all.fa.gz
kallisto quant -i  Homo_sapiens.GRCh38.idx --fusion -o SRR1659964 fastq/SRR1659964_1.fastq.gz fastq/SRR1659964_2.fastq.gz

I estimated the fragment length with the script as 283. Then I ran

pizzly -k 31 --gtf Homo_sapiens.GRCh38.89.gtf.gz --cache pizzly-cache.txt --align-score 2 --insert-size 283 --fasta Homo_sapiens.GRCh38.cdna.all.fa.gz --output SRR1659964/pizzly SRR1659964/fusion.txt

This spits out quite a few extra variants than are in the preprint:

grep '>' SRR1659964/pizzly.fusions.fasta | wc -l
4301

Am I doing something wrong? Is fragment size = insert length, or is insert length more like the Tophat style inner distance and not the fragment size so I need to subtract 2 * the read length. Thanks!

roryk commented 7 years ago

I answered this myself, and updated the documentation to describe how to collapse down to the gene level calls. That pull request also has renaming of the two scripts so we can install them as binaries, which will let pizzly get wrapped into something automated easily. Thanks! Sorry for the couple of end-of-line whitespace noise lines, my editor trims those off automatically.