maximilianh / crisporWebsite

All source code of the crispor.org website
http://crispor.org
Other
68 stars 43 forks source link

MIT Specificity Score source code #24

Closed genya closed 5 years ago

genya commented 5 years ago

Where in the repository files can one find the code implementing the MIT specificity score? The CRISPOR paper references the original MIT paper and accompanying website, but the paper leaves out the details and the website (http://crispr.mit.edu) is no longer active. Please let me know. Thanks!

maximilianh commented 5 years ago

Look at crispor.py, the function is called calcMitGuideScore and it uses as input the scores calculated by calcMitScore.

On Fri, Mar 8, 2019 at 11:37 PM Evgeni Frenkel notifications@github.com wrote:

Where in the repository files can one find the code implementing the MIT specificity score? The CRISPOR paper references the original MIT paper and accompanying website, but the paper leaves out the details and the website ( http://crispr.mit.edu) is no longer active. Please let me know. Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/24, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TT7sJbQvv0Qrmflh6JcfrO16uG6hks5vUuYjgaJpZM4bmQD7 .

genya commented 5 years ago

I see it, thank you!

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Fri, Mar 8, 2019 at 5:45 PM Maximilian Haeussler < notifications@github.com> wrote:

Look at crispor.py, the function is called calcMitGuideScore and it uses as input the scores calculated by calcMitScore.

On Fri, Mar 8, 2019 at 11:37 PM Evgeni Frenkel notifications@github.com wrote:

Where in the repository files can one find the code implementing the MIT specificity score? The CRISPOR paper references the original MIT paper and accompanying website, but the paper leaves out the details and the website ( http://crispr.mit.edu) is no longer active. Please let me know. Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/24, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAS-TT7sJbQvv0Qrmflh6JcfrO16uG6hks5vUuYjgaJpZM4bmQD7

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/24#issuecomment-471102416, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMV5uC0W8QWpnspWCnnAGeBuTmwlCvqks5vUugZgaJpZM4bmQD7 .

genya commented 5 years ago

BTW, I notice that in the code a guide-level CFD score is calculated just like the MIT specificity score but then is not used. I know the documentation says the CFD score is not technically defined at the guide level, but the output of guideCfdScore = calcMitGuideScore(sum(cfdScores))

seems to have all the same limiting behaviors as the MIT score in that it's =100 in absence of off-targets, =50 if there's one perfect matching off-target and converges to 0 as the number of off-targets increases. So what makes it unsuitable as a guide level specificity score?

maximilianh commented 5 years ago

This is a long discussion. In 2017, When I compared it against the MIT score, the correlation against the total guide-Seq off-target % was not as good for the CFD-derived score. Most likely because the CFD score behaves differently, it has a different distribution.

I still have the figure, can send it to you by email (does github take attachments?), and I could try to dig out the code (which is in the crisporAnalysis repo on github).

This is why Josh Tycko made a new and much better specificity score for saCas9, it correlates better than the MIT specificity score with guideSeq data. I've emailed him now again, but I think we didn't have a good idea on what to do with spCas9. Maybe he's reading this ticket, too.

On Sat, Mar 9, 2019 at 11:11 PM Evgeni Frenkel notifications@github.com wrote:

BTW, I notice that in the code a guide-level CFD score is calculated just like the MIT specificity score but then is not used. I know the documentation says the CFD score is not technically defined at the guide level, but the output of guideCfdScore = calcMitGuideScore(sum(cfdScores))

seems to have all the same limiting behaviors as the MIT score in that it's =100 in absence of off-targets, =50 if there's one perfect matching off-target and converges to 0 as the number of off-targets increases. So what makes it unsuitable as a guide level specificity score?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

genya commented 5 years ago

Thanks very much for clarifying. If I understand correctly, the MIT specificity score is used not because a CFD-based guide specificity score like 100/(1+sum(CFD off targets) ) is inherently inappropriate but because it doesn't work as well as the MIT specificity score. I would be curious to see the figure showing that, but it is just curiosity, not worth trouble.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Mon, Mar 11, 2019 at 6:48 AM Maximilian Haeussler < notifications@github.com> wrote:

This is a long discussion. In 2017, When I compared it against the MIT score, the correlation against the total guide-Seq off-target % was not as good for the CFD-derived score. Most likely because the CFD score behaves differently, it has a different distribution.

I still have the figure, can send it to you by email (does github take attachments?), and I could try to dig out the code (which is in the crisporAnalysis repo on github).

This is why Josh Tycko made a new and much better specificity score for saCas9, it correlates better than the MIT specificity score with guideSeq data. I've emailed him now again, but I think we didn't have a good idea on what to do with spCas9. Maybe he's reading this ticket, too.

On Sat, Mar 9, 2019 at 11:11 PM Evgeni Frenkel notifications@github.com wrote:

BTW, I notice that in the code a guide-level CFD score is calculated just like the MIT specificity score but then is not used. I know the documentation says the CFD score is not technically defined at the guide level, but the output of guideCfdScore = calcMitGuideScore(sum(cfdScores))

seems to have all the same limiting behaviors as the MIT score in that it's =100 in absence of off-targets, =50 if there's one perfect matching off-target and converges to 0 as the number of off-targets increases. So what makes it unsuitable as a guide level specificity score?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/24#issuecomment-471491115, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMV5kR7UA_IYfo-MQLVg8tQEALZ9oPhks5vVjRrgaJpZM4bmQD7 .

maximilianh commented 5 years ago

I will forward you the emails from Josh. We're just talking again about this. Guidescan.com has apparently a much better score, based on the CFD, but I haven't figured out what exactly the difference is. It's probably something really simple that I'm missing.

Are you building a new specificity score? Because if we're going to have 3-4, it's much easier for me if I can plan for that now instead of hacking them in one by one...

On Tue, Mar 12, 2019 at 3:13 AM Evgeni Frenkel notifications@github.com wrote:

Thanks very much for clarifying. If I understand correctly, the MIT specificity score is used not because a CFD-based guide specificity score like 100/(1+sum(CFD off targets) ) is inherently inappropriate but because it doesn't work as well as the MIT specificity score. I would be curious to see the figure showing that, but it is just curiosity, not worth trouble.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Mon, Mar 11, 2019 at 6:48 AM Maximilian Haeussler < notifications@github.com> wrote:

This is a long discussion. In 2017, When I compared it against the MIT score, the correlation against the total guide-Seq off-target % was not as good for the CFD-derived score. Most likely because the CFD score behaves differently, it has a different distribution.

I still have the figure, can send it to you by email (does github take attachments?), and I could try to dig out the code (which is in the crisporAnalysis repo on github).

This is why Josh Tycko made a new and much better specificity score for saCas9, it correlates better than the MIT specificity score with guideSeq data. I've emailed him now again, but I think we didn't have a good idea on what to do with spCas9. Maybe he's reading this ticket, too.

On Sat, Mar 9, 2019 at 11:11 PM Evgeni Frenkel <notifications@github.com

wrote:

BTW, I notice that in the code a guide-level CFD score is calculated just like the MIT specificity score but then is not used. I know the documentation says the CFD score is not technically defined at the guide level, but the output of guideCfdScore = calcMitGuideScore(sum(cfdScores))

seems to have all the same limiting behaviors as the MIT score in that it's =100 in absence of off-targets, =50 if there's one perfect matching off-target and converges to 0 as the number of off-targets increases. So what makes it unsuitable as a guide level specificity score?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/24#issuecomment-471491115 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAMV5kR7UA_IYfo-MQLVg8tQEALZ9oPhks5vVjRrgaJpZM4bmQD7

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/24#issuecomment-471821616, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TS4ppKHCD64S0FMT3_XZt3CV4AGuks5vVw0tgaJpZM4bmQD7 .

genya commented 5 years ago

I was computing the guide level specificity score from the off-targets files because the command line version of CRISPOR does not yet account for PAR genes (?). Also noticed that a few genes have internal repetitive elements so that the main off-targets are within the same gene. So basically wanted to compute specificity scores that included and did not include the within-gene off targets to cover these edge cases and was looking at the two options, MIT and CFD.

On Tue, Mar 12, 2019 at 7:36 AM Maximilian Haeussler < notifications@github.com> wrote:

I will forward you the emails from Josh. We're just talking again about this. Guidescan.com has apparently a much better score, based on the CFD, but I haven't figured out what exactly the difference is. It's probably something really simple that I'm missing.

Are you building a new specificity score? Because if we're going to have 3-4, it's much easier for me if I can plan for that now instead of hacking them in one by one...

On Tue, Mar 12, 2019 at 3:13 AM Evgeni Frenkel notifications@github.com wrote:

Thanks very much for clarifying. If I understand correctly, the MIT specificity score is used not because a CFD-based guide specificity score like 100/(1+sum(CFD off targets) ) is inherently inappropriate but because it doesn't work as well as the MIT specificity score. I would be curious to see the figure showing that, but it is just curiosity, not worth trouble.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Mon, Mar 11, 2019 at 6:48 AM Maximilian Haeussler < notifications@github.com> wrote:

This is a long discussion. In 2017, When I compared it against the MIT score, the correlation against the total guide-Seq off-target % was not as good for the CFD-derived score. Most likely because the CFD score behaves differently, it has a different distribution.

I still have the figure, can send it to you by email (does github take attachments?), and I could try to dig out the code (which is in the crisporAnalysis repo on github).

This is why Josh Tycko made a new and much better specificity score for saCas9, it correlates better than the MIT specificity score with guideSeq data. I've emailed him now again, but I think we didn't have a good idea on what to do with spCas9. Maybe he's reading this ticket, too.

On Sat, Mar 9, 2019 at 11:11 PM Evgeni Frenkel < notifications@github.com

wrote:

BTW, I notice that in the code a guide-level CFD score is calculated just like the MIT specificity score but then is not used. I know the documentation says the CFD score is not technically defined at the guide level, but the output of guideCfdScore = calcMitGuideScore(sum(cfdScores))

seems to have all the same limiting behaviors as the MIT score in that it's =100 in absence of off-targets, =50 if there's one perfect matching off-target and converges to 0 as the number of off-targets increases. So what makes it unsuitable as a guide level specificity score?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/24#issuecomment-471491115

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AAMV5kR7UA_IYfo-MQLVg8tQEALZ9oPhks5vVjRrgaJpZM4bmQD7

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/24#issuecomment-471821616 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAS-TS4ppKHCD64S0FMT3_XZt3CV4AGuks5vVw0tgaJpZM4bmQD7

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/24#issuecomment-471964705, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMV5ghnIblzAYHO0iFkpOyt6ylRbT2Wks5vV5FGgaJpZM4bmQD7 .

-- -- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

maximilianh commented 5 years ago

the PAR regions are handled in the BED parser, so it should also be in the command line version:

    # if an offtarget is in the PAR region, we keep only the chrY off-target

    parNum = isInPar(db, chrom, start, end)

    # keep only matches on chrX

    if parNum is not None and chrom=="chrX":

        continue

As for off-targets within the same gene, yes I don't want to exclude these. I'm pretty sure interactive users really want to see these.

On Tue, Mar 12, 2019 at 12:54 PM Evgeni Frenkel notifications@github.com wrote:

I was computing the guide level specificity score from the off-targets files because the command line version of CRISPOR does not yet account for PAR genes (?). Also noticed that a few genes have internal repetitive elements so that the main off-targets are within the same gene. So basically wanted to compute specificity scores that included and did not include the within-gene off targets to cover these edge cases and was looking at the two options, MIT and CFD.

On Tue, Mar 12, 2019 at 7:36 AM Maximilian Haeussler < notifications@github.com> wrote:

I will forward you the emails from Josh. We're just talking again about this. Guidescan.com has apparently a much better score, based on the CFD, but I haven't figured out what exactly the difference is. It's probably something really simple that I'm missing.

Are you building a new specificity score? Because if we're going to have 3-4, it's much easier for me if I can plan for that now instead of hacking them in one by one...

On Tue, Mar 12, 2019 at 3:13 AM Evgeni Frenkel notifications@github.com wrote:

Thanks very much for clarifying. If I understand correctly, the MIT specificity score is used not because a CFD-based guide specificity score like 100/(1+sum(CFD off targets) ) is inherently inappropriate but because it doesn't work as well as the MIT specificity score. I would be curious to see the figure showing that, but it is just curiosity, not worth trouble.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Mon, Mar 11, 2019 at 6:48 AM Maximilian Haeussler < notifications@github.com> wrote:

This is a long discussion. In 2017, When I compared it against the MIT score, the correlation against the total guide-Seq off-target % was not as good for the CFD-derived score. Most likely because the CFD score behaves differently, it has a different distribution.

I still have the figure, can send it to you by email (does github take attachments?), and I could try to dig out the code (which is in the crisporAnalysis repo on github).

This is why Josh Tycko made a new and much better specificity score for saCas9, it correlates better than the MIT specificity score with guideSeq data. I've emailed him now again, but I think we didn't have a good idea on what to do with spCas9. Maybe he's reading this ticket, too.

On Sat, Mar 9, 2019 at 11:11 PM Evgeni Frenkel < notifications@github.com

wrote:

BTW, I notice that in the code a guide-level CFD score is calculated just like the MIT specificity score but then is not used. I know the documentation says the CFD score is not technically defined at the guide level, but the output of guideCfdScore = calcMitGuideScore(sum(cfdScores))

seems to have all the same limiting behaviors as the MIT score in that it's =100 in absence of off-targets, =50 if there's one perfect matching off-target and converges to 0 as the number of off-targets increases. So what makes it unsuitable as a guide level specificity score?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/24#issuecomment-471491115

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AAMV5kR7UA_IYfo-MQLVg8tQEALZ9oPhks5vVjRrgaJpZM4bmQD7

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/24#issuecomment-471821616 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAS-TS4ppKHCD64S0FMT3_XZt3CV4AGuks5vVw0tgaJpZM4bmQD7

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/24#issuecomment-471964705, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMV5ghnIblzAYHO0iFkpOyt6ylRbT2Wks5vV5FGgaJpZM4bmQD7 .

--

Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.