Closed bioinfo-dirty-jobs closed 6 years ago
Sorry for the late reply. The cache behaviour is a bit weird and I need to explain it better.
If the file exists pizzly will reads gene information from it, if it doesn't exist then it parses the gtf and saves it to this file (because parsing the gtf takes much much longer we try to do this only the first time). So --cache
is both an input and output parameter.
Also --align-score
should be set to something greater than 0, perhaps 2 or 3 because this is the number of mismatches pizzly will tolerate when doing full read alignment.
--insert-size
should be an upper bound on the insert size, for regular rna-seq reads I would recomment 250 or so.
Ok Now seem works however there is man fusion...how can filter?
We are working on incorporating the filtering directly into pizzly rather than doing it by scripts. The key indicator for good fusion is the amount of reads support, e.g. paircounts or splitcounts.
Thanks so much could you please how to parse the data.
----Messaggio originale----
Da: "Pall Melsted" notifications@github.com
Data: 29/03/2017 17.30
A: "pmelsted/pizzly"pizzly@noreply.github.com
Cc: "bioinfo-dirty-jobs"mauriziopolano@blu.it, "Author"author@noreply.github.com
Ogg: Re: [pmelsted/pizzly] No fusion found (#4)
We are working on incorporating the filtering directly into pizzly rather than doing it by scripts. The key indicator for good fusion is the amount of reads support, e.g. paircounts or splitcounts.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Here is an example script
import json, sys
with open(sys.argv[1]) as f:
J = json.load(f)
genes = J['genes']
n = len(genes)
todel = []
for i, gp in enumerate(genes):
numpairs = gp['paircount']
numsplit = gp['splitcount']
# require 2 paired reads or at least one split and one paired or two split
if numpairs < 2 or (numsplit + numpairs) < 2:
todel.append(i)
for i in todel[::-1]:
del genes[i] # remove it from the json object
print(json.dumps(J,indent=2))
save it to a file simple_filter.py
and run it with
python simple_filter.py output.unfiltered.json > output.filtered.json
Thanks I try to use json from python but gave me some errors
/maurizio/Desktop/kallisto_fusion_annotate.py", line 4, in
----Messaggio originale----
Da: "Pall Melsted" notifications@github.com
Data: 29/03/2017 18.18
A: "pmelsted/pizzly"pizzly@noreply.github.com
Cc: "bioinfo-dirty-jobs"mauriziopolano@blu.it, "Author"author@noreply.github.com
Ogg: Re: [pmelsted/pizzly] No fusion found (#4)
Here is an example script import json, sys
with open(sys.argv[1]) as f: J = json.load(f)
genes = J['genes'] n = len(genes) todel = [] for i, gp in enumerate(genes): numpairs = gp['paircount'] numsplit = gp['splitcount']
# require 2 paired reads or at least one split and one paired or two split
if numpairs < 2 or (numsplit + numpairs) < 2:
todel.append(i)
for i in todel[::-1]: del genes[i] # remove it from the json object
print(json.dumps(J,indent=2))
save it to a file simple_filter.py and run it with python simple_filter.py output.unfiltered.json > output.filtered.json
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Filtering is now implemented in the latest version.
I try new version. Could you help me to transform the results on tsv. What is the way you suppose we need to use .json file on choosing real fusion ?
There is a python script in the scripts
folder of the latest release that converts the JSON output to a flat gene based table
I have a cell line where I know I have a fusion. No fusion are found..
Do you see an error on my command?