The current NCBI fasta format uses "gb" (Genbank) headers, which are not compatible with vip_db_format.pl, which produces an empty /work/VIP/FAST/vip_fast.fa.formatted file. The following patch extracts the accessions instead of "gi" numbers:
diff -Naur vip_db_format.pl.dist vip_db_format.pl
--- vip_db_format.pl.dist 2020-01-29 12:45:01.243718605 +0100
+++ vip_db_format.pl 2020-01-30 23:04:16.116854880 +0100
@@ -1,4 +1,5 @@
#!/usr/bin/perl -w
+#@(#)vip_db_format.pl 2020-10-30 last modified by A.J.Travis
#
# vip_db_format.pl
#
@@ -27,7 +28,7 @@
while (<FL>) {
chomp;
- if (/.*(gi\|[0-9]*\|).*?\n(.*)/si) {
+ if (/.*(gb\|[^|]*\|).*?\n(.*)/si) {
#if (/.*?(gi\|[0-9]*\|).*?\n(.*)/si) {
my $gi = lc($1);
my $seq = $2;
The current NCBI fasta format uses "gb" (Genbank) headers, which are not compatible with vip_db_format.pl, which produces an empty /work/VIP/FAST/vip_fast.fa.formatted file. The following patch extracts the accessions instead of "gi" numbers: