snugel / cas-offinder

An ultrafast and versatile algorithm that searches for potential off-target sites of CRISPR/Cas-derived RNA-guided endonucleases.
Other
84 stars 27 forks source link

The off-target zero-based position doesn't match the UCSC zero-based position #34

Closed anitagh closed 3 years ago

anitagh commented 3 years ago

Cas-offinder positions for off-target sites don't match UCSC blat results. UCSC blat results are zero-based. Below is an example:

Cas-offinder output: CTCCCACACGAGTTGCCCCACGCT + 483173 Blat Result using the above sequence: chr8 + 483174 483197

So, it seems the cas-offinder position is off by one base. Could you please advice?

Many Thanks, Azita

anitagh commented 3 years ago

please ignore the last comment. I realized my problem...

Blat Results:

Discrepancy in the start position in the PSL and hyperlink outputs: As mentioned above, the web version of BLAT (http://genome.ucsc.edu/cgi-bin/hgBlat) provides a choice of outputs: hyperlink or psl. The output in PSL format is useful for generating custom tracks in the UCSC Genome Browser. As described above, custom tracks are useful for adding researchers’ own data to the browser and deriving meaningful biological information from the aligned data available in the different tracks of the browser. Careful examination of the hyperlink and PSL outputs shows that the start position for the genome coordinate in the PSL output is less by one base than the start position in the hyperlink output, even though the end positions are the same. Compare Figure 16 (Protocol 3 hyperlink output) and file protocol_3_psl.xlsx (Protocol 3 psl output) provided in the Supplementary materials. In Figure 16, the chromosome X start position is 38694212 (START column next to the STRAND column), and in the file protocol_3_psl.xlsx, it is 38694211 (T start column). In both Figure 16 and file protocol_3_psl.xlsx, the chromosome X end position is 38778403 (END column next to the STRAND column and T end column, respectively). This discrepancy is best understood from a description of the difference in numbering the start and end positions of the sequence in the hyperlink and PSL outputs and in the database itself.