Closed haruosuz closed 7 years ago
CRISPR regions are composed of many repeat units that are each 23 to 50 bp. Prokka is only annotating the bounds of the CRISPR region rather than each individual repeat unit.
I am using minced
to do the prediction, author is @ctSkennerton .
I am currently only using the -gff
option but I could use -gffFull
and annotate the repeat units.
-gff Output summary results in gff format containing
only the positions of the CRISPR arrays. Default: false
-gffFull Output detailed results in gff format containing
positions of CRISPR arrays and all repeat units. Default: false
Here is the meaning of the different rpt_family
tags:
http://www.insdc.org/controlled-vocabulary-rpttype-qualifier
I found this example in Genbank: http://www.genome.jp/dbget-bin/www_bget?refseq+NC_015970
repeat_region 46177..46632
/inference="COORDINATES: alignment:crt:1.2"
/inference="COORDINATES: alignment:pilercr:v1.02"
/rpt_family="CRISPR"
/rpt_type=direct
/rpt_unit_range=46177..46204
/rpt_unit_seq="gggtcatccctgcgcgcgcgggagtcgg"
Here is an example of minced -gffFull
:
##gff-version 3
gi|384860682|ref|NC_017341.1| minced:0.2.0 CRISPR 2421118 2421311 4 . . ID=CRISPR1
gi|384860682|ref|NC_017341.1| minced:0.2.0 repeat_unit 2421118 2421140 1 . . Parent=CRISPR1;ID=DR1
gi|384860682|ref|NC_017341.1| minced:0.2.0 repeat_unit 2421174 2421196 1 . . Parent=CRISPR1;ID=DR2
gi|384860682|ref|NC_017341.1| minced:0.2.0 repeat_unit 2421233 2421255 1 . . Parent=CRISPR1;ID=DR3
gi|384860682|ref|NC_017341.1| minced:0.2.0 repeat_unit 2421289 2421311 1 . . Parent=CRISPR1;ID=DR4
And of minced -spacers
:
Sequence 'gi|384860682|ref|NC_017341.1|' (2924344 bp)
CRISPR 1 Range: 2421118 - 2421311
POSITION REPEAT SPACER
-------- ----------------------- ----------------------------------
2421118 TGTTGGGGCCCCGCCAACTTGCA CATTATTGTATGCTGACTTTTCGTCACCTTCTG [ 23, 33 ]
2421174 TGTTGGGGCCCCGTTCCCCAACT TGCATTGTCTGTAGAATTTCTTTTTGAAATTCTCTA [ 23, 36 ] 2421233 TGTTGGGGCCCCGCCAACTTGCA CATTATTGTAAGCTGACTTTCTGTCAGCTTCTG [ 23, 33 ]
2421289 TGTTGGGGCCCCGCCAACTTGTA
-------- ----------------------- ----------------------------------
Repeats: 4 Average Length: 23 Average Length: 34
I have updated Prokka to at least tell you how many repeat units there are. I will look at added repeat units in v1.13.
repeat_region 2421118..2421311
/note="CRISPR with 4 repeat units"
/rpt_family="CRISPR"
/rpt_type=direct
"CRISPR repeats range in size from 24 to 48 base pairs.[56]" (https://en.wikipedia.org/wiki/CRISPR). "Analysis of the current CRISPR database24 reveals that repeats range from 23- to 50-nt long and have an average length of 31 nt" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2928866/). For Roseovarius genome annotation using Prokka v1.11 produced long CRISPR repeats (2477 bp) as follows: