markrobinsonuzh / CrispRVariants

22 stars 4 forks source link

Large deletion events - readstotarget limit? #5

Open rleenay opened 6 years ago

rleenay commented 6 years ago

Hello again!

I am looking at some DNA repair outcomes, and a few guides I'm evaluating have very large deletions. Using the readsToTarget function, I attempted to create a 50nt window on either side of the guide, instead of the 5nt window that is shown in the example.

However, when I pull up the variants, it just sums them all up as "other". Is there an upper limit set on the size of the window? Is it possible to expand this to look at large deletions?

HLindsay commented 6 years ago

Hi,

There's no limit on the size of the window in readsToTarget. If the window is larger than the mapped range of the reads, you'll need to set the parameter minoverlap, which is the number of bases that must be mapped within the window for a read to be counted. There's two reasons that all of the reads might be counted as "Other". One is if your data is paired but you do not set collapse.pairs = TRUE. The other reason is probably what you are seeing - bwa mem reports large gaps as chimeras or supplementary alignments, which CrispRVariants then counts as "Other". There's an experimental option chimeras = "merge" in readsToTarget in the latest release version of CrispRVariants. This attempts to merge supplementary alignments into a single alignment that can be visualised. This option only works with single-end reads, so if you have paired reads currently you'll need to merge them first, e.g. with PEAR. It doesn't working in all cases - if your reads include things like duplications merging chimeras will fail. Try chimeras = "merge" in readsToTarget, and if this doesn't work try using plotChimeras to see what the chimeric reads look like and let me know what you find.

Best, Helen

rleenay commented 6 years ago

Helen,

I appreciate you helping me here as I try to figure this out. I went in and merged my PE reads, then ran them through the readsToTarget and all of the variables (counts, etc.) looked to be about the same.

However, when I start to look at chimeras I get a few errors (even with the standard +/- 5 window). The first error is that with the chimeras = "merge" option, I get the following output: "Multimapping segment of read includes indelsThis case is not implemented yet, chimeras cannot be merged." Followed by my attempted chimera plotting, which I get a different error than is reported in the CrispRVariants walkthrough:

# Confirm that all chimeric alignments are part of the same read
length(unique(names(ch))) == 1
## [1] FALSE

I'm a little confused as to why the chimeric alignments are not part of the same read? Do I need to do some additional splitting of the .bam file?

rleenay commented 6 years ago

Helen,

After some troubleshooting through the syntax of getChimeras and a few other issues, I have a few things to show you for these reads of interest. First off, I am getting that same error for the chimeras = "merge" option.

Error in mergeChimeras(bam, chimera_idxs, verbose = verbose) : 
  Multimapping segment of read includes indelsThis case is not implemented yet, chimeras cannot be merged.

Upon some analysis, I can see in the SAM file that I am getting a large deletion event, consisting of the entire protospacer, as well as ~30bp downstream of the PAM - creating the ~51bp deletion event. When I pull up the chimeric supplemental part of the SAM file, I see that bwa mem only counted the distal, 39nt of the read, ignoring the 120nt sequence that also matched. This was also seen in the plotChimeras plot:

image

Where the first chunk of the read is ignored.

What would you suggest moving forward? Is there an option we can use in bwa mem to better suit this type of analysis? Or something within CrispRVariants to specifically look at larger deletion events? I know you mentioned that you observed similar events in non-targeting controls, but it would not be difficult to perform a control analysis on the same locus to quantify against.

HLindsay commented 6 years ago

Hi,

I see you've figured out the issue with getChimeras. For anyone else who sees this thread, getChimeras returns all chimeric reads as a GAlignments object sorted by read name. To use plotChimeras, you need to select the segments with the same read name from this collection of all chimeras.

Could you clarify what you mean about the long gap? There are two alignment segments in the BAM file. Do these already show the long gap you expect, or did you have to use BLAST or something similar to find it? For the chimeric plot, if you are using the result of getChimeras, there should be another mapped segment in the list (otherwise it wouldn't be considered chimeric). If you select this with something like:

all.chimeras <- getChimeras(crispr_set, sample  = 1)
single.read <- all.chimeras[names(all.chimeras) == "read_name_of_interest"]
plotChimeras(single.read) 

does the plot change?

I'm not clear whether the problem is that CrispRVariants is not able to collapse the read, or bwa mem is not able to map it. If it's a mapping problem, you could try a different aligner. I've tried bowtie2, GMAP and BLAT and found BWA the best for long deletions, but perhaps one of the splicing aligners can do a better job. I'll spend some time today trying to improve the CrispRVariants chimera merging function and let you know how that goes. If you'd like to email me a fastq or bam file with the problematic reads I'll also see if I can find a better alignment option.

Best, Helen

rleenay commented 6 years ago

Helen,

I see what you mean by getChimeras grabbing both reads now. In terms of the problem, I see this read in two spots in the SAM/BAM file (aligned and supplemental reads). So, I'm fairly certain that the mergeChimeras option just doesn't want to collapse the read. I've attached the updated plot with the syntax you provided:

image