markrobinsonuzh / CrispRVariants

22 stars 4 forks source link

Haplotyping #1

Open Murskautuminen opened 8 years ago

Murskautuminen commented 8 years ago

Hello!

I was just wondering if this software would be useful to determine full haplotypes or quasispecies in "multiple" CRISPR edited polyclonal mixtures (for example 2 sgRNA's cutting close to each other in th same gene) consisting of a high amount of different CRISPR clones.

For example: if you have multiple sequencing reads from a MiSeq containing SNVs and indels not located directly next to each other, but more devided over the 2 sgRNA cutting sites. Would this software be able to determine the full set of haplotypes in this polyclonal mixture?

In addition, I am correct that if you CRISPR a gene with only 1 sgRNA, this programme should be able to determine the complete haplotype (the amount of reads with specific co-occuring variants/deletions)?

HLindsay commented 8 years ago

Hello,

Yes, we have used CrispRVariants for looking at multiple sgRNAs. It will detect all insertion/deletion variants combinations. We have also seen structural variants such as inversions in multiplexed experiments. These are grouped together as "chimeric" reads. We only detect SNVs in reads that do not have an insertion or deletion.

I'm not sure if I understand your example. Do you want to distinguish reads that have the same indel but different SNVs upstream or downstream? In this case, CrispRVariants would group the reads based on their shared indel, i.e., it would not detect the SNVs. However, it is possible to get the alignments after running CrispRVariants, so I think you could use the VariantTools package for detecting SNVs.

The behaviour is similar for a single sgRNA. All indel combinations are detected and SNVs are detected in reads with no indel - by default within a region from 8 bases upstream to 6 bases downstream of the cut site. We chose these defaults because our primary interest so far has been detecting SNVs that affect how well the sgRNA binds. Reads without indels are grouped by their combination of SNVs.

I hope this answers your question. If not, feel free to email me some example reads and we can discuss further.

Best wishes, Helen

Murskautuminen commented 8 years ago

Hello again Lindsay,

Would it be easy and possible for me to change the SNV detection range further than the described 8 bp upstream and 6 bp downstream?

I have an example here: In this sample it detects SNV -7,-6,-3,-1. But it should also detect the SNV G->C at -10. (The first SNV in the list with the highest amount of reads). crispr sample

In addition, even though the name calling does not take into account the SNVs outside the defined range, sequences should currently be treated as unique by the software if they only differ between SNVs outside this range? Or are hypothetical sequences like SNV -9 (T->C) or SNV -9 (T->A) or SNV -11 regarded as the same?

HLindsay commented 8 years ago

Hi,

The window for SNV calling is set using the parameters "upstream.snv" (default 8) and "downstream.snv" (default 6), e.g. readsToTarget(bams, target = target, reference = reference, upstream.snv = 10). There is a bug in the release version of CrispRVariants that meant that these arguments were not always passed on correctly. I have just pushed a fix for this - development version 1.1.4. This should be available from Bioconductor devel within a few days or you can download the source from the Bioconductor mirror. SNV calling is one of the slow steps in initialization, so bear in mind that larger windows will make the computation slower.

SNVs outside the detection window are not considered unique alleles. If downstream.snv = 8, a sequence with SNV -9 (T->C) and no other SNVs will be considered as "no variant". If you set downstream.snv = 9, SNV -9 (T->C) or SNV -9 (T->A) will both be labelled SNV:-9 and the plot will show the consensus allele (with could be the ambiguity code "M" meaning "A or C").

As your example looks like a heterozygote, I'd recommend you extend the upstream.snv window and consider removing reads with this variant if calculating mutation efficiency.

Best wishes, Helen

yl974 commented 7 years ago

Hello!

Some of my samples use two sgRNAs. May I ask how to implement crispRVariants in this case?

I saw that in the manual there is a file, extdata/metadata/metadata.xls, which can have two sgRNAs for a sample, but I did not see how this information was used in the variable ‘reference’.

How should I build the ‘target’ and ‘reference”? Could you please give me an example?

Best,

Spring