lucapinello / CRISPResso

Software pipeline for the analysis of CRISPR-Cas9 genome editing outcomes from sequencing data
Other
131 stars 55 forks source link

Use --min_identity_score for read vs amplicon #36

Closed rakarnik closed 6 years ago

rakarnik commented 6 years ago

Hi, I am using CRISPResso for simple amplicon seq analysis. I initially got the error that no reads aligned, which I tracked back to the fact that the amplicon sequence is 500 bp and the reads are only 150 bp. Would it be possible to change the min_identity_score to apply to the percent of the read that aligned rather than the amplicon? Pointers to where I could change this in the code would be appreciated as well. From a quick check, this number is being parsed out from the needle output, which defines it as the number of identical bases divided by the total in the alignment, which ends up being approxiamtely the length of the amplicon in the case described above. Maybe we can parse out the identical base count and divide by the length of the read instead? Thanks for the help! -Rahul

lucapinello commented 6 years ago

Hi Rahul,

You are probably providing the wrong reference amplicon to CRISPResso.

The reference amplicon should correspond to what you expect to sequence in case of non-edited cells. So it cannot be longer than 150bp (for single end reads) or 280-290 for overlapping paired ends ( we advise an overlap of 10-20bp) in your case.

We don't support analysis of non-overlapping reads.

Hope this is helpful.

On Wed, Feb 28, 2018 at 12:38 AM, Rahul Karnik notifications@github.com wrote:

Hi, I am using CRISPResso for simple amplicon seq analysis. I initially got the error that no reads aligned, which I tracked back to the fact that the amplicon sequence is 500 bp and the reads are only 150 bp. Would it be possible to change the min_identity_score to apply to the percent of the read that aligned rather than the amplicon? Pointers to where I could change this in the code would be appreciated as well. From a quick check, this number is being parsed out from the needle output, which defines it as the number of identical bases divided by the total in the alignment, which ends up being approxiamtely the length of the amplicon in the case described above. Maybe we can parse out the identical base count and divide by the length of the read instead? Thanks for the help! -Rahul

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lucapinello/CRISPResso/issues/36, or mute the thread https://github.com/notifications/unsubscribe-auth/ABB_6gpbmiQtDIJSzTwL3HANeF-JuStoks5tZOY7gaJpZM4SWGxD .

rakarnik commented 6 years ago

I am running it in single end mode, because the 150 bp paired end reads do not overlap across the ~500bp amplicon. I figured out that taking the first and last 150 bp of the amplicon and running 4 separate CRISPResso runs can serve as a workaround (align read 1 to first 150 bp, then last 150 bp, read 2 to fist, read 2 to last). Still seeing some issues, but will open a separate entry. Thanks Luca!

lucapinello commented 6 years ago

Of course, happy to help.

Best,

Luca

On Thu, Mar 1, 2018 at 9:58 AM, Rahul Karnik notifications@github.com wrote:

Closed #36 https://github.com/lucapinello/CRISPResso/issues/36.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lucapinello/CRISPResso/issues/36#event-1499184138, or mute the thread https://github.com/notifications/unsubscribe-auth/ABB_6o0SbgGgO7bceMXxvKYEk3dhjE39ks5taAx9gaJpZM4SWGxD .