pinellolab / CRISPRme

Other
18 stars 8 forks source link

Questions on Crisprme capabilities #40

Closed ksiewert closed 11 months ago

ksiewert commented 1 year ago

Hello, We're looking into CRISPRme and I'm wondering if we could get clarification on a few details.

(1) If there are multiple indels within one guide's length of each other that together introduce a new off-target, will CRISPRme find it? (2) How does CRISPRMe deal with PAMless enzymes? Making a TST containing the entire genome seems computationally expensive. (3) Is there an allele frequency filter that is applied? If not, how does the computation time not balloon with larger reference panels? (4) Is it correct that bulges are not allowed in the PAM?

Thank you, Katie

samuelecancellieri commented 1 year ago

Hello @ksiewert Answering your questions, 1) we do not manage paired indels, unfortunately the process is not already implemented. But we manage composition of SNPs on the same guide length. 2) TST is a very powerful structure and can manage even complex databases as the complete genome with no problem. 3) since we generate a single genome with all the SNPs collapsed into IUPAC nucleotides, we can extract back all targets in polinomial time, hence no explosion in time processing. 4) yes, no bulges in PAM sequence are allowed.

Hope all the answers help you with the software usage. Write back if you have any more questions.

Best, Samuele

ksiewert commented 1 year ago

Thank you. This is helpful!

ksiewert commented 1 year ago

Hi Samuele,

Thinking about things more, a few more related questions arose. I know these questions are about edge cases, and CRISPRMe seems like it will detect the vast majority or off-targets introduced by variants. We're just trying to get a full picture of the types of off-targets it will detect.

The first is whether CRISPRMe searches alt contigs in combination with indels or SNPs from the vcfs. Is it correct that the alt contig fastas are searched for off-targets as is, but no new version of them with indels or IUPAC ambiguous bases are created? Or if a version with IUPAC bases is produced, what are the names of these files?

In addition, is it correct that CRISPRMe will not detect off-targets that are introduced by a combination of a SNP and an indel? Basically, is it correct that no version of the reference is created containing both indels and IUPAC ambiguous bases?

Thank you, Katie

samuelecancellieri commented 1 year ago

Hello @ksiewert CRISPRme search on all the chromosome, including alternative contigs, if you input the vcf data for them. The vcf we used in the paper and on the website, are from 1000G base project, so no alt contigs are provided with variants data. But if you have them, you can just proceed with them as for the reference chromosome. Regarding the SNPs plus Indels, you are right. The software does not account for targets with both Indels and SNPs on the same sequence due to the nature of the search tree we used to create the database.

Hope this helps you. Thanks again for the interest in the software and if you have any other question. Don't hesitate to write.

Best,