markrobinsonuzh / CrispRVariants

23 stars 4 forks source link

Help - shifted guide window #2

Closed benediktrauscher closed 7 years ago

benediktrauscher commented 7 years ago

Hello,

I'm trying to plot InDels of a CRISPR experiment using CrispRVariants and I'm mostly successful. However, in the final plot the black window that indicates the guide position and cutting site are shifted to the left even though the alignment itself looks perfectly fine:

screen shot 2016-11-25 at 18 08 48

Any idea what I could be doing wrong? My code looks like so

library(CrispRVariants)
library(Rsamtools)
library(rtracklayer)
library(GenomicAlignments)
library(gridExtra)
library(Biostrings)

## user input
guide_seq <- DNAString('ACGCGATGTCCGTGCACAGCGGG')
fq_dir <- 'fastq/fzd1'
fa_file <- 'fzd1.fa'

## parameters
bam_dir <- 'bam'
bwa_index_dir <- 'bwa_index'

## guide information
# read fasta
ref_ss <- readDNAStringSet(fa_file)
guide_chr <- names(ref_ss)[1]
if(vcountPattern(guide_seq, ref_ss)[1]){
  guide_strand <- '+'
} else if(vcountPattern(reverseComplement(guide_seq), ref_ss[1])) {
  guide_strand <- '-'
} else {
  stop('Guide doesn not match reference')
}
if(guide_strand=='+'){
  guide_pos <- vmatchPattern(guide_seq, ref_ss)
} else {
  guide_pos <- vmatchPattern(reverseComplement(guide_seq), ref_ss[1])
}
guide_start <- start(guide_pos[[1]])
guide_end <- end(guide_pos[[1]])

## make a bwa index
bwa_index <- paste(bwa_index_dir, index_name, sep='/')
system(paste('bwa index', fa_file))

fq_fnames <- list.files(fq_dir, full.names = T)
bam_fnames <- gsub(".fastq.gz$",".fastq.gz.bam", basename(fq_fnames))
srt_bm_fnames <- file.path(bam_dir, gsub(".bam","_s", bam_fnames))
# Map, sort and index the bam files, remove the unsorted bams
for(i in 1:length(fq_fnames)) {
  cmd <- paste0("bwa mem ", fa_file, " ", fq_fnames[i],
                " | samtools view -Sb - > ", bam_fnames[i])
  message(cmd, "\n"); system(cmd)
  indexBam(sortBam(bam_fnames[i],srt_bm_fnames[i]))
  unlink(bam_fnames[i])
}

# Represent the guide as a GenomicRanges::GRanges object
gd <- GRanges(seqnames=guide_chr, ranges = IRanges(start=guide_start, end=guide_end), strand = guide_strand)
gdl <- resize(gd, width(gd) + 10, fix = "center")

## fetch the reference
system(paste('samtools faidx', fa_file))
ref <- system(sprintf(paste('samtools faidx', fa_file, '%s:%s-%s'),
                         seqnames(gdl)[1], start(gdl)[1], end(gdl)[1]),
                 intern = TRUE)[[2]]

# reference into biostrings object; reverse complement if necessary
ref <- Biostrings::DNAString(ref)
if(guide_strand == '-'){
  ref <- Biostrings::reverseComplement(ref)
}

## make a crispr set
crispr_set <- readsToTarget(paste0(srt_bm_fnames, '.bam'), target = gdl, reference = ref,
                            name = basename(fq_fnames))
plotVariants(crispr_set, plotAlignments.args = list(top.n = 3),
             plotFreqHeatmap.args = list(top.n = 3))

I would much appreciate any help.

Many thanks

Benedikt

HLindsay commented 7 years ago

Hi Benedikt,

This problem happens because the target location (cut site) isn't correctly specified. For your set up, the target location is base 22. This is specified using the target.loc parameter in readsToTarget:

crispr_set <- readsToTarget(paste0(srt_bm_fnames, '.bam'), target = gdl, reference = ref,
                            name = basename(fq_fnames), target.loc = 22)

What is a bit confusing to me is that the default value for target.loc is 17, so from your code above I would expect the guide box to be shifted 5 bases to the left of the actual cut site. Your figure is what I would expect to see if target.loc was set to 0. Are you using the latest version of CrispRVariants from Bioconductor?

Helen

benediktrauscher commented 7 years ago

Hi Helen,

thank you for your prompt reply. I have now figured out the bug in my code: The function parameter 'name' in my code needs to actually be 'names'. Writing 'name' instead of 'names' will always result in a target location of 0, no matter what the target.loc-parameter is set to (tested with CrispRVariants v1.2.0 on R 3.3.2).

Thank you for developing this useful tool!

Best regards

Benedikt