pmelsted / pizzly

Fast fusion detection using kallisto
BSD 2-Clause "Simplified" License
80 stars 10 forks source link

Insert size estimate by Kallisto #5

Closed uros-sipetic closed 7 years ago

uros-sipetic commented 7 years ago

Hi Is the information about insert size that Kallisto estimates available in any of the Kallisto outputs, and if so what's the fastest way to pipe that information into Pizzly?

pmelsted commented 7 years ago

kallisto prints out the mean insert length on stderr when it runs. The estimated fragment length distribution is in the abundance.h5 file and I am working on a script that will pull out this value so that it can be automatically set for pizzly.

See this script for a start https://github.com/pachterlab/kallisto/issues/134#issuecomment-288727862

uros-sipetic commented 7 years ago

Thank you, this is great!

pmelsted commented 7 years ago

Here is a version that pulls out the 95th percentile of the fragment length distribution

Run it as python get_fragment_length.py kallisto_output/abundance.h5

import h5py
import numpy as np
import sys

fn = sys.argv[1]
f = h5py.File(fn)
x = np.asarray(f['aux']['fld'], dtype='float64')
y = np.cumsum(x)/np.sum(x)
cutoff = np.argmax(y > .95)
print(cutoff)
kmhernan commented 7 years ago

@pmelsted in the command line args it says the "max" value, but do you suggest to instead use the 95th percentile as shown in this script? It is very different from the max value so I wanted to get your input.