pinellolab / CRISPRme

Other
17 stars 8 forks source link

False indel-derived offtargets #67

Open thomas-davis opened 1 week ago

thomas-davis commented 1 week ago

Describe the bug

I suspect CRISRPme is occasionally reporting off-targets generated by indels that are False-- i.e. that are not actually generated by the reported indel

As part of an analysis, I generate oligos representing variant-containing off-target sequences with surrounding genomic context. To do this I'm extract the reference sequence and surrounding context for off-target sites reported in crisprme, and then introducing the reported SNP into the sequence. 95-99% of the time I am able to find the reported off-target in this extracted sequence. However rarely, for some indel associated off-targets, I am unable to find a sequence that matched the reported off-target.

I can't send explicit examples unfortunately because they involve proprietary spacers, but I have observed this same bug across two versions of crisprme (v2.1.5 and v2.1.1). Is this a known bug? What other information would be helpful in tracking this down? I'm happy to disclose the code I'm using to add variants to the extracted sequence. If you have outputs for a public spacer I can take a look at whether the same bug is present and give concrete examples. The bug is preventing us from being able to use crisprme for our application.

To Reproduce Ran crisprme via the command line using HG38, 1000G + HGDP.

--genome Genomes/full_renamed \
--vcf 1000G_and_hgdp_list_vcf.txt \
--samplesID list_samplesID.txt \
--guide THE_NAME_OF_OUR_SPACER_spacer.txt \
--pam PAMs/20bp-FOO-CAS.txt \
--annotation Annotations/encode+gencode.hg38.bed \
--gene_annotation Annotations/encode+gencode.hg38.bed \
--mm 6 \
--bDNA 2 \
--bRNA 2 \
--bMax 2 \
--merge 3 \
--output THE_NAME_OF_OUR_SPACER \
--thread 64
  1. Spacer sequences I can't disclose this, apologies. Observed across two difference spacers. If need be I can try to reproduce with a public spacer.

  2. Cas protein Also cannot disclose this. Effect observed across two different CAS's

  3. PAM Also cannot disclose this. Effect observed across two different PAMs

  4. Genome HG38

  5. Variants dataset (OPTIONAL) HGDP + 1000G

  6. Thresholds Mismatches: 6 DNA Bulges: 2 RNA Bulges: 2

Expected behavior Indels should generate the expected off-target sequence

Screenshots If running CRISPRme via website, add screenshots to help explain your problem.

Environment (please complete the following information, ONLY applicable if running CRISPRme via command line):

Thank you in advance for your help!

ManuelTgn commented 1 week ago

Hi @thomas-davis,

Thank you for bringing this issue to our attention. Could you kindly share the code you’re using to add variants to the sequences? This will help us replicate the behavior in CRISPRme using a public spacer, so we can investigate if the issue persists in a broader context.

Best, Manuel

ManuelTgn commented 2 days ago

Hi @thomas-davis,

I am writing to follow up on the status of the scripts that were being adapted to utilize public guides for reproducing the unexpected behaviors observed in CRISPRme. We believe that having access to these scripts could be instrumental in helping us identify the underlying causes of the inconsistencies we have encountered.

Any updates or insights you could provide would be greatly appreciated, as they may help expedite our understanding and resolution of these issues.

Additionally, I would like to let you know that version 2.1.1 has been deprecated, as it contains a known bug that affects the correct reporting of targets generated by indels. We advise using a CRISPRme v2.1.5 or greater to avoid this issue, although you reported similar issues even running this tool's version.

Thank you for your time and assistance.

Best, Manuel

thomas-davis commented 2 days ago

Hey @ManuelTgn apologies I'll get back to you as soon as I can.

We were also wondering what public spacer you'd planned to use for crisprme testing, ideally wanted to run it on our end as well to test our infrastructure. It is the one listed in sg1617.txt?

lucapinello commented 2 days ago

Yes, we can use that one for testing and reproducibility purposes.

Thanks!

Luca

On Mon, Sep 16, 2024 at 12:31 PM thomas-davis @.***> wrote:

Hey @ManuelTgn https://github.com/ManuelTgn apologies I'll get back to you as soon as I can.

We were also wondering what public spacer you'd planned to use for crisprme testing, ideally wanted to run it on our end as well to test our infrastructure. It is the one listed in sg1617.txt?

— Reply to this email directly, view it on GitHub https://github.com/pinellolab/CRISPRme/issues/67#issuecomment-2353390619, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIH72QUQAAHRS3TYWP3ZH3ZW4BU5AVCNFSM6AAAAABN7RE662VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJTGM4TANRRHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>