snugel / cas-offinder

An ultrafast and versatile algorithm that searches for potential off-target sites of CRISPR/Cas-derived RNA-guided endonucleases.
Other
84 stars 27 forks source link

When will v3 be officially out? #63

Open hukai916 opened 1 month ago

hukai916 commented 1 month ago

Hi developers,

Thanks for creating Cas-OFFinder!

I am very interested in incorporating it into our analytical pipeline. However, seems that there are several unsolved issues in Cas-OFFinder v2 per issues history. Can you kindly let me know if Cas-OFFinder is still actively maintained and if there is a plan for v3 release? We would like to perform bulge analysis, should we wait for v3 or use the wrapper made for v2.4? Thanks!

Best,

pjb7687 commented 1 month ago

Hi,

Thanks a lot for your interest in Cas-OFFinder. I am no longer working on this repo anymore, and all the development efforts for the next version of Cas-OFFinder has been moved to https://github.com/pnucolab.

As you can notice, I have recently started an independent new research group. But soon after I realized that I cannot really focus on the actual development anymore due to a lot of administrative burdens and teaching. So now my students are working on it - they are still on the learning curve, which means it is slow, but it is keep progressing.

I once thought hiring a scientific programmer or a postdoc to boost the progress, but we have recently failed to secure fundings (e.g. CZI EOSS) for that...

But as I said, although it is slow, we are working on it. And perhaps we can release it by the end of this year, hopefully. If you are still interested, keep on your eyes on our new group page. Thanks!

Best, Jeongbin

hukai916 commented 1 month ago

Hi Jeongbin,

Congrats to your new roles! BTW, is there any possibility/interest in migrating Cas-OFFinder to the R world?

Best, Kai

pjb7687 commented 1 month ago

Cas-OFFinder is a standalone program, thus you can call it from any language, inkl. R.

hukai916 commented 1 month ago

Hi Jeongbin,

Do you absolutely NOT recommend using v3 for now? I am asking because I did some benchmarking tests using v2.4, v2.4.1, v3b, with and without bulge analysis. The results are very inconsistent, and some are confusing. Below is my summary. My testing input files are like below:

# input_withoutbulge.txt
/Users/kaihu/GitHub/CasOFFinder/ref
NNNNNNNNNNNNNNNNNNNNNRG
GTGTCCTCCACACCAGAATCAGG 3
TGTCCTCCACACCAGAATCAGGG 3
CCAGAGCAGGATCCACAAACTGG 3

# input_withbulge.txt
/Users/kaihu/GitHub/CasOFFinder/ref
NNNNNNNNNNNNNNNNNNNNNRG 3 3
GTGTCCTCCACACCAGAATCAGG 3
TGTCCTCCACACCAGAATCAGGG 3
CCAGAGCAGGATCCACAAACTGG 3

v2.4

Using input_withoutbulge.txt:

GTGTCCTCCACACCAGAATCAGG chrX    48649585        GTGTCCTCCACACCAGAATCAGG +       0
TGTCCTCCACACCAGAATCAGGG chrX    48649586        TGTCCTCCACACCAGAATCAGGG +       0
CCAGAGCAGGATCCACAAACTGG chrX    48649563        CCAGAGCAGGATCCACAAACTGG -       0
CCAGAGCAGGATCCACAAACTGG chrX    155270553       nnnnnnn                 +       0
CCAGAGCAGGATCCACAAACTGG chrX    155270554       nnnnnn                  +       0
CCAGAGCAGGATCCACAAACTGG chrX    155270555       nnnnn                   +       0
CCAGAGCAGGATCCACAAACTGG chrX    155270556       nnnn                    +       0
CCAGAGCAGGATCCACAAACTGG chrX    155270557       nnn                     +       0
CCAGAGCAGGATCCACAAACTGG chrX    155270558       nn                      +       0
CCAGAGCAGGATCCACAAACTGG chrX    155270559       n                       +       0
CCAGAGCAGGATCCACAAACTGG chrX    155270553                       nnnnnnn -       0
CCAGAGCAGGATCCACAAACTGG chrX    155270554                        nnnnnn -       0
CCAGAGCAGGATCCACAAACTGG chrX    155270555                         nnnnn -       0
CCAGAGCAGGATCCACAAACTGG chrX    155270556                          nnnn -       0
CCAGAGCAGGATCCACAAACTGG chrX    155270557                           nnn -       0
CCAGAGCAGGATCCACAAACTGG chrX    155270558                            nn -       0
CCAGAGCAGGATCCACAAACTGG chrX    155270559                             n -       0
GTGTCCTCCACACCAGAATCAGG chr21   48129894                              n -       3
TGTCCTCCACACCAGAATCAGGG chr21   48119359        caggtcagACctggGcgggcGGG +       2
TGTCCTCCACACCAGAATCAGGG chr21   48129881        nnnnnnnnnnnnnn          +       3
TGTCCTCCACACCAGAATCAGGG chr21   48129886        nnnnnnnnn               +       2

What are the "nnnn" in the output? Why the mismatch numbers do not match with the number of lower case letters (e.g. the third last line)?

Using input_withbulge.txt:

GTGTCCTCCACACCAGAATCAGG chrX    48649585        GTGTCCTCCACACCAGAATCAGGGGTT     +       0
TGTCCTCCACACCAGAATCAGGG chrX    48649586        TGTCCTCCACACCAGAATCAGGGGTTT     +       0
CCAGAGCAGGATCCACAAACTGG chrX    48649559        CCAGAGCAGGATCCACAAACTGGGGGA     -       0

With bulge in the input.txt and using the Python wrapper (cas-offinder-bulge)

#Bulge type     crRNA   DNA     Chromosome      Position        Direction       Mismatches      Bulge Size
X       NNNNNNNNNNNNNNNNNNNNNRG GTGTCCTCCACACCAGAATCAGG chrX    48649585        +       0       0
X       NNNNNNNNNNNNNNNNNNNNNRG GGTGTCCTCCACACCAGAATCAGG        chrX    48649584        +       0       0
X       NNNNNNNNNNNNNNNNNNNNNRG tAgtcCaTtCcatgtcatcatctG        chrX    107374146       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG tTaGTCCattcCAtgtcAtcatct        chrX    107374147       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG cTTagtCcattCcatgtcATCAtc        chrX    107374148       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG cCTtagtcCattcCatGtcatcat        chrX    107374149       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG cCctTagTCCAttCCAtgtcatca        chrX    107374150       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG aCccTtagtCcattCcatgTCAtc        chrX    107374151       -       0       0
X       NNNNNNNNNNNNNNNNNNNNNRG aAcccttagtcCAttccAtgtcat        chrX    107374152       -       0       0
X       NNNNNNNNNNNNNNNNNNNNNRG cAaccCtTagtCcattccATgtca        chrX    107374153       -       0       0
X       NNNNNNNNNNNNNNNNNNNNNRG cCaacCCTtagtcCattccatgtc        chrX    107374154       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG aCcaaCCcttAgtCCAttccatGt        chrX    107374155       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG tAccaaCcCttagtCcattcCAtG        chrX    107374156       -       0       0
X       NNNNNNNNNNNNNNNNNNNNNRG aTaccaacCCttAgtccAtTCcat        chrX    107374157       -       0       0
X       NNNNNNNNNNNNNNNNNNNNNRG aATacCaaCCcttagtccATtcca        chrX    107374158       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG GAataCCaaCcCttagtccattcc        chrX    107374159       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG tGaaTaCcaacCcttAGtccattc        chrX    107374160       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG aTgaatacCaACcCttagtcCAtt        chrX    107374161       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG tATGaataCCAacCCttAgTCcat        chrX    107374162       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG GTatgaaTaCcaACCcttAgtcca        chrX    107374163       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG tGTaTgaatacCAaCccttagtcc        chrX    107374164       -       1       0

Why many of the resulting crRNAs contain so many NNNs? And also the mismatches do not align with lower case letters.

v2.4.1

Using input_withoutbulge.txt:

GTGTCCTCCACACCAGAATCAGG chrX    48649585        GTGTCCTCCACACCAGAATCAGG +       0
GTGTCCTCCACACCAGAATCAGG chrX    107374158       tgGaatggactAaggGttggtat +       3
GTGTCCTCCACACCAGAATCAGG chrX    107374159       GgaatggaCtaAgggttggtAtt +       3
GTGTCCTCCACACCAGAATCAGG chrX    107374160       GaaTggaCtAagggttggTattc +       1
GTGTCCTCCACACCAGAATCAGG chrX    107374161       aatggactaAgggttGgtattca +       3
GTGTCCTCCACACCAGAATCAGG chrX    107374162       aTGgaCTaagggttgGtATtcat +       3
GTGTCCTCCACACCAGAATCAGG chrX    107374163       tgGaCtaagggttggtAtTCAta +       3
GTGTCCTCCACACCAGAATCAGG chrX    107374166       actaagggttggtattcATacat +       3
TGTCCTCCACACCAGAATCAGGG chrX    48649586        TGTCCTCCACACCAGAATCAGGG +       0
TGTCCTCCACACCAGAATCAGGG chrX    107374158       TGgaaTggACtaagGgtTggtat +       1
TGTCCTCCACACCAGAATCAGGG chrX    107374159       gGaatggactAaggGttggtatt +       3
TGTCCTCCACACCAGAATCAGGG chrX    107374160       gaatggaCtaAgggttggtAttc +       3
TGTCCTCCACACCAGAATCAGGG chrX    107374161       aaTggaCtAagggttggTattca +       2
TGTCCTCCACACCAGAATCAGGG chrX    107374163       TGgaCTaagggttgGtATtcata +       3
TGTCCTCCACACCAGAATCAGGG chrX    107374164       gGaCtaagggttggtAtTCAtac +       3
TGTCCTCCACACCAGAATCAGGG chrX    107374167       ctaagggttggtattcATacatG +       3
CCAGAGCAGGATCCACAAACTGG chrX    48649563        CCAGAGCAGGATCCACAAACTGG -       0

Again, many mismatch numbers do not align with the lower case letter counts.

Using input_withbulge.txt:

Total 1 device(s) found.
Loading input file...
Critical error! The length of target sequences should match with the length of pattern sequence.

Failed to run without using the wrapper.

With bulge in the input.txt and using the Python wrapper (cas-offinder-bulge)

#Bulge type     crRNA   DNA     Chromosome      Position        Direction       Mismatches      Bulge Size
X       NNNNNNNNNNNNNNNNNNNNNRG GTGTCCTCCACACCAGAATCAGG chrX    48649585        +       0       0
X       NNNNNNNNNNNNNNNNNNNNNRG GTtTtCTtttCcCagtgtggAaG chrX    107373527       -       3       0
X       NNNNNNNNNNNNNNNNNNNNNRG caGaagaCtAacttcaAAgggGG chrX    107373479       -       2       0
X       NNNNNNNNNNNNNNNNNNNNNRG GGTGTCCTCCACACCAGAATCAGG        chrX    48649584        +       0       0
X       NNNNNNNNNNNNNNNNNNNNNRG aAatTagaaatgtatctttaaAaG        chrX    107373715       -       3       0
X       NNNNNNNNNNNNNNNNNNNNNRG tTTaaaCcCatattaAtAAattaG        chrX    107373732       -       3       0
X       NNNNNNNNNNNNNNNNNNNNNRG tTTcTttTCCcagtgtGgAagtGG        chrX    107373524       -       3       0
X       NNNNNNNNNNNNNNNNNNNNNRG GTTtTCaTtgttttCttttcCcaG        chrX    107373535       -       3       0
X       NNNNNNNNNNNNNNNNNNNNNRG cTcagaagaCtaACttcAAaggGG        chrX    107373480       -       3       0
X       NNNNNNNNNNNNNNNNNNNNNRG aTctcagaagACtaacttcaaAGG        chrX    107373482       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG aTatgCtaaacatgatctcagAaG        chrX    107373496       -       2       0
X       NNNNNNNNNNNNNNNNNNNNNRG aAaaTatgCtAaACatGAtctcaG        chrX    107373499       -       3       0
X       NNNNNNNNNNNNNNNNNNNNNRG GgTGTCCTCCACACCAGAATCAGG        chrX    48649584        +       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG GgtGTCCTCCACACCAGAATCAGG        chrX    48649584        +       2       0
X       NNNNNNNNNNNNNNNNNNNNNRG tgtTTtCTtttCcCagtgtggAaG        chrX    107373527       -       3       0
X       NNNNNNNNNNNNNNNNNNNNNRG tcaGaagaCtAacttcaAAgggGG        chrX    107373479       -       2       0
X       NNNNNNNNNNNNNNNNNNNNNRG GgtgTCCTCCACACCAGAATCAGG        chrX    48649584        +       3       0
X       NNNNNNNNNNNNNNNNNNNNNRG aaaTTagaaatgtatctttaaAaG        chrX    107373715       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG tTtaAaCcCatattaAtAAattaG        chrX    107373732       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG tTtcTttTCCcagtgtGgAagtGG        chrX    107373524       -       0       0
X       NNNNNNNNNNNNNNNNNNNNNRG tgtTTtCTtttCcCagtgtggAaG        chrX    107373527       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG catTGttTtCttttCccAgTgtGG        chrX    107373530       -       1       0
X       NNNNNNNNNNNNNNNNNNNNNRG GTtTTCaTtgttttCttttcCcaG        chrX    107373535       -       1       0

Again, many NNNNs in the crRNA, and disagreeing mismatches with lower letter counts.

v3

Using input_withoutbulge.txt:

#Id     Bulge Type      crRNA   DNA     Chromosome      Location        Direction       Mismatches      Bulge Size
2       X       CCAGAGCAGGATCCACAAACTGG CCAGAGCAGGATCCACAAACTGG chrX    48649563        -       0       0
0       X       GTGTCCTCCACACCAGAATCAGG GTGTCCTCCACACCAGAATCAGG chrX    48649585        +       0       0
1       X       TGTCCTCCACACCAGAATCAGGG TGTCCTCCACACCAGAATCAGGG chrX    48649586        +       0       0

Using input_withbulge.txt:

#Id     Bulge Type      crRNA   DNA     Chromosome      Location        Direction       Mismatches      Bulge Size
2       DNA     CCAGAGCAGGATCCACAAAC---TGG      CCAGAGCAGGATCCACAAACTGGgGG      chrX    48649560        -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      CCAGAGCAGGATCCACAAACTGGGGG      chrX    48649560        -       0       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      CttagtCcattcCatgtcAtcatCTG      chrX    107374146       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      CCttAGtccatTCCAtgtcaTcaTCT      chrX    107374147       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      CCcttagtccATtCcatgtCatcATC      chrX    107374148       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      aCccttagtccattcCAtgtcatCAT      chrX    107374149       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      aacccttAGtccattCcAtgTcaTCA      chrX    107374150       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      CaAcccttaGtcCattccAtgtcATC      chrX    107374151       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      CCAaccCttagTCCAttccaTGtCAT      chrX    107374152       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      aCcaAcCcttAgtCcattcCatGTCA      chrX    107374153       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      taccAaCccttagtcCAttCcatGTC      chrX    107374154       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      atAccaaccctTagtCcAttccaTGT      chrX    107374155       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      aataccaAcccTtagtccAtTccATG      chrX    107374156       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      gaAtAcCAacccttAgtccaTtcCAT      chrX    107374157       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      tgAataCcaaccCttagtcCattCCA      chrX    107374158       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      atgaAtaccaAcCCttAgtCcatTCC      chrX    107374159       -       1       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      tatGAatAccAaCCcttAgtccaTTC      chrX    107374160       -       0       3
2       DNA     CCAGAGCAGGATCCACAAACTGG---      gtAtgaataccaaCcCttAgTccATT      chrX    107374161       -       1       3

Again, mismatch counts do not align with lower case letter numbers.

To sum up:

  1. If without bulge: though v3 generates the least number of results, they seem to be most reliable, whereas v2.4 and v2.4.1 both generate more results but many of them are confusing (e.g. either contain "nnnn" in DNA or the mismatch column does not match with lower letter counts).
  2. If with bulge: both v2.4 and v2.4.1 need the wrapper to function, and the results contain many "NNNN" in the crRNA. v3 results seem the most reasonable. However, for all three versions, the mismatch counts do not align with the lower case letter numbers.

Therefore, to perform bulge analysis, seems that I should use v3, and probably further filter the output by calculating the "actual" mismatches. However, given your note "WARNING: Cas-OFFinder 3 is not production ready yet, it is known that the result can be different from that of Cas-OFFinder 2. For production use please use the latest Cas-OFFinder 2 instead.", I am hesitating to using v3, but it is the only version that seem to perform bulge correctly. Can you advise? Ty very much!