Closed yk-tanigawa closed 7 years ago
After indexing a vcf file,
awk '(! /^#/){ print "chr"$0 }(/^#/){ print $0 }' dbsnp_all_20160527.vcf | bgzip > dbsnp_all_20160527.100.vcf.gz
tabix -p vcf dbsnp_all_20160527.vcf.gz
I should be able to query SNPs by index like
$tabix dbsnp_all_20160527.vcf.gz chr1:10000-10020
column names are
CHROM POS ID REF ALT QUAL FILTER INFO
0-index/1-index issue
[ytanigaw@sh-2-1 ~/data/nanopore-wgs-consortium]$ head nanopore-wgs.25000.sorted.10k.mapq50.ext.w0.dbsnps |awk '{print $1, $2, $3, $4, $5}'
chr10 36353487 rs147789727 G A
chr10 50479569 rs116159204 A G
chr10 50502624 rs568082579 T A
chr10 81365552 rs11189519 T A
chr11 4953946 rs115916471 G A
chr11 83638545 rs113399123 C T
chr11 87196993 rs777698009 T C
chr11 87199541 rs772763241 C G
chr11 89919898 rs75442183 A C
chr11 92048173 rs568985594 AAAG A
/share/PI/mrivas/data/dbsnp/dbsnp_all_20151104.vcf.gz
[ytanigaw@sherlock-ln02 login_node ~/data/nanopore-wgs-consortium]$ cat nanopore-wgs.25000.sorted.10k.mapq50.ext.w0.snps|grep -v '>'|head|awk '{print $1, $2, $3, $4, $5}'
chr10 394714 rs4880610 G A
chr10 1227299 rs7894015 G A
chr10 1235018 rs1392831 G A
chr10 3461323 rs10795040 G A
chr10 3485920 rs7067778 C A
chr10 6978146 rs1417020 A G
chr10 8647329 rs10905422 A T
chr10 8653428 rs11255772 C T
chr10 12689914 rs12780223 T C
chr10 14445612 rs10796188 G A
[ytanigaw@sherlock-ln02 login_node ~/data/nanopore-wgs-consortium]$ cat nanopore-wgs.25000.sorted.10k.mapq50.ext.w0.snps|grep -v '>'|wc -l
1248
https://github.com/rivas-lab/nanopore/blob/20161217GIABextraction/src/dump_snps.py
to check dbsnp.
-s
option to pass a gzipped vcf file