mckennalab / FlashFry

FlashFry: The rapid CRISPR target site characterization tool
Other
63 stars 10 forks source link

Inconsistent Offtarget Reporting #14

Closed tjs000 closed 4 years ago

tjs000 commented 4 years ago

Hi,

I'm running FlashFry to locate off-target sites of a given guide RNA. I find that the report up to a certain number of mismatches is inconsistent. I'll put the guide RNA and commands below so the issue can be reproduced. (Or if I'm making some mistake in how I'm running the software!)

Cpf1 Guide RNA with 5' PAM cat cpfguide.fa

>cpfguide TTTCCCACGGCATCAAGTGCCCCG

Database Indexing Command java -Xmx4g -jar FlashFry-assembly-1.9.9.1.jar index --tmpLocation /tmp --database hg38_cpf1_database --reference ~/hg38.fa --enzyme cpf1

Command for searching up to 2 mismatches java -Xmx4g -jar FlashFry-assembly-1.9.9.1.jar discover --database hg38_cpf1_database --fasta cpfguide.fa --output cpfguide2.out --positionOutput --maxMismatch=2 --maximumOffTargets=40000; cat cpfguide2.out Output

contig  start   stop    target  context overflow        orientation     otCount offTargets
cpfguide        0       24      TTTCCCACGGCATCAAGTGCCCCG        NONE    OK      FWD     0

Command for searching up to 3 mismatches java -Xmx4g -jar FlashFry-assembly-1.9.9.1.jar discover --database hg38_cpf1_database --fasta cpfguide.fa --output cpfguide3.out --positionOutput --maxMismatch=3 --maximumOffTargets=40000; cat cpfguide3.out Output

contig  start   stop    target  context overflow        orientation     otCount offTargets
cpfguide        0       24      TTTCCCACGGCATCAAGTGCCCCG        NONE    OK      FWD     2       TTTTGCACGGCATCAAGTAACCCG_2_3<chr10:47553816^R|chr10:46786533^F>

Command for searching up to 4 mismatches java -Xmx4g -jar FlashFry-assembly-1.9.9.1.jar discover --database hg38_cpf1_database --fasta cpfguide.fa --output cpfguide4.out --positionOutput --maxMismatch=4 --maximumOffTargets=40000; cat cpfguide4.out Output

contig  start   stop    target  context overflow        orientation     otCount offTargets
cpfguide        0       24      TTTCCCACGGCATCAAGTGCCCCG        NONE    OK      FWD     7       TTTCCCACGGCATCAAGTGCCCCG_1_0<chr17:80790186^R>,TTTGCCACGGCATCAACTGCCCAG_1_2<chr2:136115625^F>,TTTGCCACGGCATCAAGGCCCCGC_1_4<chr2:115641298^F>,TTTGCCACGGCTTCATCTGCCCCC_1_4<chr17:17768968^R>,TTTCCCACTGCTTCAACTGCCCCT_1_4<chr10:102146516^R>,TTTTGCACGGCATCAAGTAACCCG_2_3<chr10:47553816^R|chr10:46786533^F>

The output when allowing 4 mismatches shows a locus with a perfect match, as well as one with two mismatches. Neither of these are reported when using the --maxMismatch=3 or --maxMismatch=2 parameter though, and I would have expected them to.

Thank you for the tool! I've been using it quite a bit for SpCas9 and never noticed anything amiss. This is my first time trying with a Cpf1 guide though.

Best, Tim

aaronmck commented 4 years ago

Hi Tim,

That is super weird! Thanks for the info, let me run it and see if I can reproduce this. Everything looks fine for the way you're running flashfry. I'll look into this now. Thanks!

-Aaron

aaronmck commented 4 years ago

Hi Tim,

Thanks again for reporting this. There is an issue with the first indexed block with Cpf1 (it's kind of in the depth of our database, but we have both indexed and linear blocks). I'll fix the underlying issue, but I've disabled this optimization for Cpf1 for now. Cas9 results were unaffected. This does mean you'll have to regenerate your index files with the lastest (1.10) version. Thanks again for the bug report, this was an important one!

-Aaron

tjs000 commented 4 years ago

Hi Aaron,

Great! Thank you for the speedy fix. I downloaded v1.10, re-indexed, and tested it with two of my Cpf1 guide RNAs. The results are as expected now.

Best, Tim

aaronmck commented 4 years ago

Hi Tim,

Great! I'll close this for now, thanks again for the report.

-Aaron