Open hukai916 opened 1 month ago
Hi,
Thanks a lot for your interest in Cas-OFFinder. I am no longer working on this repo anymore, and all the development efforts for the next version of Cas-OFFinder has been moved to https://github.com/pnucolab.
As you can notice, I have recently started an independent new research group. But soon after I realized that I cannot really focus on the actual development anymore due to a lot of administrative burdens and teaching. So now my students are working on it - they are still on the learning curve, which means it is slow, but it is keep progressing.
I once thought hiring a scientific programmer or a postdoc to boost the progress, but we have recently failed to secure fundings (e.g. CZI EOSS) for that...
But as I said, although it is slow, we are working on it. And perhaps we can release it by the end of this year, hopefully. If you are still interested, keep on your eyes on our new group page. Thanks!
Best, Jeongbin
Hi Jeongbin,
Congrats to your new roles! BTW, is there any possibility/interest in migrating Cas-OFFinder to the R world?
Best, Kai
Cas-OFFinder is a standalone program, thus you can call it from any language, inkl. R.
Hi Jeongbin,
Do you absolutely NOT recommend using v3 for now? I am asking because I did some benchmarking tests using v2.4, v2.4.1, v3b, with and without bulge analysis. The results are very inconsistent, and some are confusing. Below is my summary. My testing input files are like below:
# input_withoutbulge.txt
/Users/kaihu/GitHub/CasOFFinder/ref
NNNNNNNNNNNNNNNNNNNNNRG
GTGTCCTCCACACCAGAATCAGG 3
TGTCCTCCACACCAGAATCAGGG 3
CCAGAGCAGGATCCACAAACTGG 3
# input_withbulge.txt
/Users/kaihu/GitHub/CasOFFinder/ref
NNNNNNNNNNNNNNNNNNNNNRG 3 3
GTGTCCTCCACACCAGAATCAGG 3
TGTCCTCCACACCAGAATCAGGG 3
CCAGAGCAGGATCCACAAACTGG 3
GTGTCCTCCACACCAGAATCAGG chrX 48649585 GTGTCCTCCACACCAGAATCAGG + 0
TGTCCTCCACACCAGAATCAGGG chrX 48649586 TGTCCTCCACACCAGAATCAGGG + 0
CCAGAGCAGGATCCACAAACTGG chrX 48649563 CCAGAGCAGGATCCACAAACTGG - 0
CCAGAGCAGGATCCACAAACTGG chrX 155270553 nnnnnnn + 0
CCAGAGCAGGATCCACAAACTGG chrX 155270554 nnnnnn + 0
CCAGAGCAGGATCCACAAACTGG chrX 155270555 nnnnn + 0
CCAGAGCAGGATCCACAAACTGG chrX 155270556 nnnn + 0
CCAGAGCAGGATCCACAAACTGG chrX 155270557 nnn + 0
CCAGAGCAGGATCCACAAACTGG chrX 155270558 nn + 0
CCAGAGCAGGATCCACAAACTGG chrX 155270559 n + 0
CCAGAGCAGGATCCACAAACTGG chrX 155270553 nnnnnnn - 0
CCAGAGCAGGATCCACAAACTGG chrX 155270554 nnnnnn - 0
CCAGAGCAGGATCCACAAACTGG chrX 155270555 nnnnn - 0
CCAGAGCAGGATCCACAAACTGG chrX 155270556 nnnn - 0
CCAGAGCAGGATCCACAAACTGG chrX 155270557 nnn - 0
CCAGAGCAGGATCCACAAACTGG chrX 155270558 nn - 0
CCAGAGCAGGATCCACAAACTGG chrX 155270559 n - 0
GTGTCCTCCACACCAGAATCAGG chr21 48129894 n - 3
TGTCCTCCACACCAGAATCAGGG chr21 48119359 caggtcagACctggGcgggcGGG + 2
TGTCCTCCACACCAGAATCAGGG chr21 48129881 nnnnnnnnnnnnnn + 3
TGTCCTCCACACCAGAATCAGGG chr21 48129886 nnnnnnnnn + 2
What are the "nnnn" in the output? Why the mismatch numbers do not match with the number of lower case letters (e.g. the third last line)?
GTGTCCTCCACACCAGAATCAGG chrX 48649585 GTGTCCTCCACACCAGAATCAGGGGTT + 0
TGTCCTCCACACCAGAATCAGGG chrX 48649586 TGTCCTCCACACCAGAATCAGGGGTTT + 0
CCAGAGCAGGATCCACAAACTGG chrX 48649559 CCAGAGCAGGATCCACAAACTGGGGGA - 0
cas-offinder-bulge
)#Bulge type crRNA DNA Chromosome Position Direction Mismatches Bulge Size
X NNNNNNNNNNNNNNNNNNNNNRG GTGTCCTCCACACCAGAATCAGG chrX 48649585 + 0 0
X NNNNNNNNNNNNNNNNNNNNNRG GGTGTCCTCCACACCAGAATCAGG chrX 48649584 + 0 0
X NNNNNNNNNNNNNNNNNNNNNRG tAgtcCaTtCcatgtcatcatctG chrX 107374146 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG tTaGTCCattcCAtgtcAtcatct chrX 107374147 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG cTTagtCcattCcatgtcATCAtc chrX 107374148 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG cCTtagtcCattcCatGtcatcat chrX 107374149 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG cCctTagTCCAttCCAtgtcatca chrX 107374150 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG aCccTtagtCcattCcatgTCAtc chrX 107374151 - 0 0
X NNNNNNNNNNNNNNNNNNNNNRG aAcccttagtcCAttccAtgtcat chrX 107374152 - 0 0
X NNNNNNNNNNNNNNNNNNNNNRG cAaccCtTagtCcattccATgtca chrX 107374153 - 0 0
X NNNNNNNNNNNNNNNNNNNNNRG cCaacCCTtagtcCattccatgtc chrX 107374154 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG aCcaaCCcttAgtCCAttccatGt chrX 107374155 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG tAccaaCcCttagtCcattcCAtG chrX 107374156 - 0 0
X NNNNNNNNNNNNNNNNNNNNNRG aTaccaacCCttAgtccAtTCcat chrX 107374157 - 0 0
X NNNNNNNNNNNNNNNNNNNNNRG aATacCaaCCcttagtccATtcca chrX 107374158 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG GAataCCaaCcCttagtccattcc chrX 107374159 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG tGaaTaCcaacCcttAGtccattc chrX 107374160 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG aTgaatacCaACcCttagtcCAtt chrX 107374161 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG tATGaataCCAacCCttAgTCcat chrX 107374162 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG GTatgaaTaCcaACCcttAgtcca chrX 107374163 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG tGTaTgaatacCAaCccttagtcc chrX 107374164 - 1 0
Why many of the resulting crRNAs contain so many NNNs? And also the mismatches do not align with lower case letters.
GTGTCCTCCACACCAGAATCAGG chrX 48649585 GTGTCCTCCACACCAGAATCAGG + 0
GTGTCCTCCACACCAGAATCAGG chrX 107374158 tgGaatggactAaggGttggtat + 3
GTGTCCTCCACACCAGAATCAGG chrX 107374159 GgaatggaCtaAgggttggtAtt + 3
GTGTCCTCCACACCAGAATCAGG chrX 107374160 GaaTggaCtAagggttggTattc + 1
GTGTCCTCCACACCAGAATCAGG chrX 107374161 aatggactaAgggttGgtattca + 3
GTGTCCTCCACACCAGAATCAGG chrX 107374162 aTGgaCTaagggttgGtATtcat + 3
GTGTCCTCCACACCAGAATCAGG chrX 107374163 tgGaCtaagggttggtAtTCAta + 3
GTGTCCTCCACACCAGAATCAGG chrX 107374166 actaagggttggtattcATacat + 3
TGTCCTCCACACCAGAATCAGGG chrX 48649586 TGTCCTCCACACCAGAATCAGGG + 0
TGTCCTCCACACCAGAATCAGGG chrX 107374158 TGgaaTggACtaagGgtTggtat + 1
TGTCCTCCACACCAGAATCAGGG chrX 107374159 gGaatggactAaggGttggtatt + 3
TGTCCTCCACACCAGAATCAGGG chrX 107374160 gaatggaCtaAgggttggtAttc + 3
TGTCCTCCACACCAGAATCAGGG chrX 107374161 aaTggaCtAagggttggTattca + 2
TGTCCTCCACACCAGAATCAGGG chrX 107374163 TGgaCTaagggttgGtATtcata + 3
TGTCCTCCACACCAGAATCAGGG chrX 107374164 gGaCtaagggttggtAtTCAtac + 3
TGTCCTCCACACCAGAATCAGGG chrX 107374167 ctaagggttggtattcATacatG + 3
CCAGAGCAGGATCCACAAACTGG chrX 48649563 CCAGAGCAGGATCCACAAACTGG - 0
Again, many mismatch numbers do not align with the lower case letter counts.
Total 1 device(s) found.
Loading input file...
Critical error! The length of target sequences should match with the length of pattern sequence.
Failed to run without using the wrapper.
cas-offinder-bulge
)#Bulge type crRNA DNA Chromosome Position Direction Mismatches Bulge Size
X NNNNNNNNNNNNNNNNNNNNNRG GTGTCCTCCACACCAGAATCAGG chrX 48649585 + 0 0
X NNNNNNNNNNNNNNNNNNNNNRG GTtTtCTtttCcCagtgtggAaG chrX 107373527 - 3 0
X NNNNNNNNNNNNNNNNNNNNNRG caGaagaCtAacttcaAAgggGG chrX 107373479 - 2 0
X NNNNNNNNNNNNNNNNNNNNNRG GGTGTCCTCCACACCAGAATCAGG chrX 48649584 + 0 0
X NNNNNNNNNNNNNNNNNNNNNRG aAatTagaaatgtatctttaaAaG chrX 107373715 - 3 0
X NNNNNNNNNNNNNNNNNNNNNRG tTTaaaCcCatattaAtAAattaG chrX 107373732 - 3 0
X NNNNNNNNNNNNNNNNNNNNNRG tTTcTttTCCcagtgtGgAagtGG chrX 107373524 - 3 0
X NNNNNNNNNNNNNNNNNNNNNRG GTTtTCaTtgttttCttttcCcaG chrX 107373535 - 3 0
X NNNNNNNNNNNNNNNNNNNNNRG cTcagaagaCtaACttcAAaggGG chrX 107373480 - 3 0
X NNNNNNNNNNNNNNNNNNNNNRG aTctcagaagACtaacttcaaAGG chrX 107373482 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG aTatgCtaaacatgatctcagAaG chrX 107373496 - 2 0
X NNNNNNNNNNNNNNNNNNNNNRG aAaaTatgCtAaACatGAtctcaG chrX 107373499 - 3 0
X NNNNNNNNNNNNNNNNNNNNNRG GgTGTCCTCCACACCAGAATCAGG chrX 48649584 + 1 0
X NNNNNNNNNNNNNNNNNNNNNRG GgtGTCCTCCACACCAGAATCAGG chrX 48649584 + 2 0
X NNNNNNNNNNNNNNNNNNNNNRG tgtTTtCTtttCcCagtgtggAaG chrX 107373527 - 3 0
X NNNNNNNNNNNNNNNNNNNNNRG tcaGaagaCtAacttcaAAgggGG chrX 107373479 - 2 0
X NNNNNNNNNNNNNNNNNNNNNRG GgtgTCCTCCACACCAGAATCAGG chrX 48649584 + 3 0
X NNNNNNNNNNNNNNNNNNNNNRG aaaTTagaaatgtatctttaaAaG chrX 107373715 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG tTtaAaCcCatattaAtAAattaG chrX 107373732 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG tTtcTttTCCcagtgtGgAagtGG chrX 107373524 - 0 0
X NNNNNNNNNNNNNNNNNNNNNRG tgtTTtCTtttCcCagtgtggAaG chrX 107373527 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG catTGttTtCttttCccAgTgtGG chrX 107373530 - 1 0
X NNNNNNNNNNNNNNNNNNNNNRG GTtTTCaTtgttttCttttcCcaG chrX 107373535 - 1 0
Again, many NNNNs in the crRNA, and disagreeing mismatches with lower letter counts.
#Id Bulge Type crRNA DNA Chromosome Location Direction Mismatches Bulge Size
2 X CCAGAGCAGGATCCACAAACTGG CCAGAGCAGGATCCACAAACTGG chrX 48649563 - 0 0
0 X GTGTCCTCCACACCAGAATCAGG GTGTCCTCCACACCAGAATCAGG chrX 48649585 + 0 0
1 X TGTCCTCCACACCAGAATCAGGG TGTCCTCCACACCAGAATCAGGG chrX 48649586 + 0 0
#Id Bulge Type crRNA DNA Chromosome Location Direction Mismatches Bulge Size
2 DNA CCAGAGCAGGATCCACAAAC---TGG CCAGAGCAGGATCCACAAACTGGgGG chrX 48649560 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- CCAGAGCAGGATCCACAAACTGGGGG chrX 48649560 - 0 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- CttagtCcattcCatgtcAtcatCTG chrX 107374146 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- CCttAGtccatTCCAtgtcaTcaTCT chrX 107374147 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- CCcttagtccATtCcatgtCatcATC chrX 107374148 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- aCccttagtccattcCAtgtcatCAT chrX 107374149 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- aacccttAGtccattCcAtgTcaTCA chrX 107374150 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- CaAcccttaGtcCattccAtgtcATC chrX 107374151 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- CCAaccCttagTCCAttccaTGtCAT chrX 107374152 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- aCcaAcCcttAgtCcattcCatGTCA chrX 107374153 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- taccAaCccttagtcCAttCcatGTC chrX 107374154 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- atAccaaccctTagtCcAttccaTGT chrX 107374155 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- aataccaAcccTtagtccAtTccATG chrX 107374156 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- gaAtAcCAacccttAgtccaTtcCAT chrX 107374157 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- tgAataCcaaccCttagtcCattCCA chrX 107374158 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- atgaAtaccaAcCCttAgtCcatTCC chrX 107374159 - 1 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- tatGAatAccAaCCcttAgtccaTTC chrX 107374160 - 0 3
2 DNA CCAGAGCAGGATCCACAAACTGG--- gtAtgaataccaaCcCttAgTccATT chrX 107374161 - 1 3
Again, mismatch counts do not align with lower case letter numbers.
To sum up:
Therefore, to perform bulge analysis, seems that I should use v3, and probably further filter the output by calculating the "actual" mismatches. However, given your note "WARNING: Cas-OFFinder 3 is not production ready yet, it is known that the result can be different from that of Cas-OFFinder 2. For production use please use the latest Cas-OFFinder 2 instead.", I am hesitating to using v3, but it is the only version that seem to perform bulge correctly. Can you advise? Ty very much!
Hi developers,
Thanks for creating Cas-OFFinder!
I am very interested in incorporating it into our analytical pipeline. However, seems that there are several unsolved issues in Cas-OFFinder v2 per issues history. Can you kindly let me know if Cas-OFFinder is still actively maintained and if there is a plan for v3 release? We would like to perform bulge analysis, should we wait for v3 or use the wrapper made for v2.4? Thanks!
Best,