Closed uros-sipetic closed 7 years ago
kallisto prints out the mean insert length on stderr when it runs. The estimated fragment length distribution is in the abundance.h5
file and I am working on a script that will pull out this value so that it can be automatically set for pizzly.
See this script for a start https://github.com/pachterlab/kallisto/issues/134#issuecomment-288727862
Thank you, this is great!
Here is a version that pulls out the 95th percentile of the fragment length distribution
Run it as python get_fragment_length.py kallisto_output/abundance.h5
import h5py
import numpy as np
import sys
fn = sys.argv[1]
f = h5py.File(fn)
x = np.asarray(f['aux']['fld'], dtype='float64')
y = np.cumsum(x)/np.sum(x)
cutoff = np.argmax(y > .95)
print(cutoff)
@pmelsted in the command line args it says the "max" value, but do you suggest to instead use the 95th percentile as shown in this script? It is very different from the max value so I wanted to get your input.
Hi Is the information about insert size that Kallisto estimates available in any of the Kallisto outputs, and if so what's the fastest way to pipe that information into Pizzly?