pmelsted / pizzly

Fast fusion detection using kallisto
BSD 2-Clause "Simplified" License
80 stars 10 forks source link

No fusion found #4

Closed bioinfo-dirty-jobs closed 6 years ago

bioinfo-dirty-jobs commented 7 years ago

I have a cell line where I know I have a fusion. No fusion are found..

Do you see an error on my command?

maurizio@Tardis:~/A_kallisto$ /opt/pizzly -k 31 --gtf ~/Homo_sapiens.GRCh38.79.gtf  --cache ~/Homo_sapiens.GRCh38   --align-score 0 --insert-size 125 --fasta ~/Homo_sapiens.GRCh38.rel79.cdna.all.fa  --output A_fusion fusion.txt 
Opening cached file ... loaded 0 genes and 0 transcripts
Read a total of 173259 transcripts
Number of kept records 0 out of 746328
pmelsted commented 7 years ago

Sorry for the late reply. The cache behaviour is a bit weird and I need to explain it better.

If the file exists pizzly will reads gene information from it, if it doesn't exist then it parses the gtf and saves it to this file (because parsing the gtf takes much much longer we try to do this only the first time). So --cache is both an input and output parameter.

Also --align-score should be set to something greater than 0, perhaps 2 or 3 because this is the number of mismatches pizzly will tolerate when doing full read alignment.

--insert-size should be an upper bound on the insert size, for regular rna-seq reads I would recomment 250 or so.

bioinfo-dirty-jobs commented 7 years ago

Ok Now seem works however there is man fusion...how can filter?

pmelsted commented 7 years ago

We are working on incorporating the filtering directly into pizzly rather than doing it by scripts. The key indicator for good fusion is the amount of reads support, e.g. paircounts or splitcounts.

bioinfo-dirty-jobs commented 7 years ago

Thanks so much could you please how to parse the data.

----Messaggio originale----

Da: "Pall Melsted" notifications@github.com

Data: 29/03/2017 17.30

A: "pmelsted/pizzly"pizzly@noreply.github.com

Cc: "bioinfo-dirty-jobs"mauriziopolano@blu.it, "Author"author@noreply.github.com

Ogg: Re: [pmelsted/pizzly] No fusion found (#4)

We are working on incorporating the filtering directly into pizzly rather than doing it by scripts. The key indicator for good fusion is the amount of reads support, e.g. paircounts or splitcounts.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

pmelsted commented 7 years ago

Here is an example script

import json, sys

with open(sys.argv[1]) as f:
    J = json.load(f)

genes = J['genes']
n = len(genes)
todel = []
for i, gp in enumerate(genes):
    numpairs = gp['paircount']
    numsplit = gp['splitcount']

    # require 2 paired reads or at least one split and one paired or two split
    if numpairs < 2 or (numsplit + numpairs) < 2:
        todel.append(i)

for i in todel[::-1]:
    del genes[i] # remove it from the json object

print(json.dumps(J,indent=2))

save it to a file simple_filter.py and run it with

python simple_filter.py output.unfiltered.json > output.filtered.json
bioinfo-dirty-jobs commented 7 years ago

Thanks I try to use json from python but gave me some errors

/maurizio/Desktop/kallisto_fusion_annotate.py", line 4, in J = json.load(f) File "/usr/lib/python2.7/json/init.py", line 291, in load **kw) File "/usr/lib/python2.7/json/init.py", line 339, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 364, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 380, in raw_decode obj, end = self.scan_once(s, idx) ValueError: Expecting , delimiter: line 14 column 10 (char 518)

----Messaggio originale----

Da: "Pall Melsted" notifications@github.com

Data: 29/03/2017 18.18

A: "pmelsted/pizzly"pizzly@noreply.github.com

Cc: "bioinfo-dirty-jobs"mauriziopolano@blu.it, "Author"author@noreply.github.com

Ogg: Re: [pmelsted/pizzly] No fusion found (#4)

Here is an example script import json, sys

with open(sys.argv[1]) as f: J = json.load(f)

genes = J['genes'] n = len(genes) todel = [] for i, gp in enumerate(genes): numpairs = gp['paircount'] numsplit = gp['splitcount']

# require 2 paired reads or at least one split and one paired or two split
if numpairs < 2 or (numsplit + numpairs) < 2:
    todel.append(i)

for i in todel[::-1]: del genes[i] # remove it from the json object

print(json.dumps(J,indent=2))

save it to a file simple_filter.py and run it with python simple_filter.py output.unfiltered.json > output.filtered.json

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

pmelsted commented 7 years ago

Filtering is now implemented in the latest version.

bioinfo-dirty-jobs commented 7 years ago

I try new version. Could you help me to transform the results on tsv. What is the way you suppose we need to use .json file on choosing real fusion ?

pmelsted commented 6 years ago

There is a python script in the scripts folder of the latest release that converts the JSON output to a flat gene based table