Not sure if that was the indended use of the updategff command (or even the AGP format), but I was trying to update annotation based on an AGP file where I introduced a break into the chromosome and got corrupted GFF file. Minimal example follows:
For completeness, I attach the solution that leaves me with the correct coordinates (for BED files though):
#!/usr/bin/gawk -f
BEGIN{
help="\
This script updates a BED file using the transformation defined by an AGP file. \n\
That is, BED in coordinates of contigs gets transformed to coordinates of scaffolds. \n\
Pleaze bgzip and tabix the BED file before operation. \n\
EXAMPLE: updategff.awk file.agp file.bed.gz > file_transform.bed"
if (ARGC < 2) {print help; exit 1}
OFS="\t"
}
# Exploit tabix to query bed files
NR==FNR && !/^#/ && $5=="W" {
cmd="tabix " ARGV[2] " " $6":"$7"-"$8
while ( cmd | getline bedline > 0 ) {
split(bedline, bed)
if (bed[2] < $7 || bed[3] > $8) { next }
if ($9=="+"){
printf "%s\t%s\t%s", $1, bed[2]-$7+$2, bed[3]-$7+$2
} else {
printf "%s\t%s\t%s", $1, $8-bed[3]+$2, $8-bed[2]+$2 }
if (length(bed) > 3) {
for (i=4;i<=length(bed);i++) {
printf "\t%s", bed[i]
print ""
}
}
}
close(cmd)
}
Hi!
Not sure if that was the indended use of the
updategff
command (or even the AGP format), but I was trying to update annotation based on an AGP file where I introduced a break into the chromosome and got corrupted GFF file. Minimal example follows:Obviously, positions 5 and 6 of
bibaboba
should then be converted into positions 1 and 2 ofboba
. But what I get isThe coordinates do not get transformed.
For completeness, I attach the solution that leaves me with the correct coordinates (for BED files though):