Custom Reference Sequence

Hi,

The "target" needs to be a GRanges object in order to find the alignments that have mapped to the correct sequence. However, the mapping reference itself is arbitrary. For example, if you mapped to a fasta file containing a sequence named "hdr_reference", then your target would be something like GRanges("hdr_reference", IRanges(1, nchar(hdr_reference))) (with the locations either being the entire reference sequence or some subset of it). For a standard (bwa) pipeline with an arbitrary reference, make a fasta file of your custom reference sequence then index it and align to it as usual. Depending on how you set up your reference sequence, you might want to use the min.overlap parameter to allow alignments that don't completely cover the reference sequence - otherwise they are filtered out. I'd also suggest using chimeras = "merge" - this is an experimental parameter for linearising ambiguous alignments where possible - and if you used a paired guide set a large value for chimera.to.target - this determines how close a gap in the alignment must be to the cut site.

The reference sequence input to readsToTarget also doesn't actually need to match the reference genome sequence used for mapping, for example if you mapped to a standard genome but want to count nucleotide substitutions with respect to a known variant allele.

Hope this helps! Helen

markrobinsonuzh / CrispRVariants

Custom Reference Sequence #7