Closed leiendeckerlu closed 7 years ago
For what it's worth I have implemented a simple downstream JSON flattening / gene location annotation / distance calculation (where on same chr) script in R here. I don't explicitly filter the output, but you can use of course sort the final tab delimited output via splitcount or paircount columns as you see fit.
Very cool, will definitely check that out! Thanks Matt!
There are two approaches to filtering, one is to set a read count minimum. Pizzly does this by requiring two pairs supporting a fusion junction or one split read. You can also run the flattening script in the latest version to get a TSV table that is easier to filter on.
The other approach is to run kallisto to quantify the fusion transcripts and select those which have a decent TPM support. An example pipelins is in the Snakefile
in the test
directory.
Hi there,
referring to https://github.com/pmelsted/pizzly/issues/2, I was wondering if you could explain some of the filtering criteria you are using to create the Sample.json from Sample.unfiltered.json ?
Also, is there currently a possibility to set these filtering criteria in an individual way?
Out of curiosity, do you mind sharing your downstream analysis pipeline with the pizzly output file? Since pizzly identifies a lot of false positives, I definitely have to do some additional filtering (most probably on paircounts ?) I'm currently looking into using R for this, but I'm not sure if that's the most elegant way. Thanks!