talkowski-lab / gnomad-sv-pipeline

Code and custom scripts relevant to gnomAD-SV (Collins*, Brand*, et al., 2020)
https://broad.io/gnomad_sv
MIT License
36 stars 14 forks source link

Structural variants shouldn't use END as the position of a translocation #3

Open nawatts opened 5 years ago

nawatts commented 5 years ago

Moving this issue from macarthur-lab/gnomad_browser#140. Originally opened by @lindenb.


Cross posted at https://www.biostars.org/p/386345/

What you did: I downloaded the SV from gnomad:

wget -O gnomad_v2_sv.sites.vcf.gz "https://storage.googleapis.com/gnomad-public/papers/2019-sv/gnomad_v2_sv.sites.vcf.gz"

wget -O gnomad_v2_sv.sites.vcf.gz.tbi "https://storage.googleapis.com/gnomad-public/papers/2019-sv/gnomad_v2_sv.sites.vcf.gz.tbi"

there is a translocation at: 12 :60718971

$ bcftools view gnomad_v2_sv.sites.vcf.gz  | grep gnomAD_v2_CTX_12_13 -m1

12  60718971    gnomAD_v2_CTX_12_13 N   <CTX  999 PASS    END=57020218;SVTYPE=CTX;CHR2=13;SVLEN=-1;ALGORITHMS=manta;EVIDENCE=PE;CPX_TYPE=CTX_PP/QQ;PROTEIN_CODING__NEAREST_TSS=SLC16A7;PROTEIN_CODING__INTERGENIC;AN=21476;AC=1;AF=4.7e-05;N_BI_GENOS=10738;N_HOMREF=10737;N_HET=1;N_HOMALT=0;FREQ_HOMREF=0.999907;FREQ_HET=9.31272e-05;FREQ_HOMALT=0;AFR_AN=9480;AFR_AC=0;AFR_AF=0;AFR_N_BI_GENOS=4740;AFR_N_HOMREF=4740;AFR_N_HET=0;AFR_N_HOMALT=0;AFR_FREQ_HOMREF=1;AFR_FREQ_HET=0;AFR_FREQ_HOMALT=0;AMR_AN=1784;AMR_AC=1;AMR_AF=0.000561;AMR_N_BI_GENOS=892;AMR_N_HOMREF=891;AMR_N_HET=1;AMR_N_HOMALT=0;AMR_FREQ_HOMREF=0.998879;AMR_FREQ_HET=0.00112108;AMR_FREQ_HOMALT=0;EAS_AN=2226;EAS_AC=0;EAS_AF=0;EAS_N_BI_GENOS=1113;EAS_N_HOMREF=1113;EAS_N_HET=0;EAS_N_HOMALT=0;EAS_FREQ_HOMREF=1;EAS_FREQ_HET=0;EAS_FREQ_HOMALT=0;EUR_AN=7598;EUR_AC=0;EUR_AF=0;EUR_N_BI_GENOS=3799;EUR_N_HOMREF=3799;EUR_N_HET=0;EUR_N_HOMALT=0;EUR_FREQ_HOMREF=1;EUR_FREQ_HET=0;EUR_FREQ_HOMALT=0;OTH_AN=388;OTH_AC=0;OTH_AF=0;OTH_N_BI_GENOS=194;OTH_N_HOMREF=194;OTH_N_HET=0;OTH_N_HOMALT=0;OTH_FREQ_HOMREF=1;OTH_FREQ_HET=0;OTH_FREQ_HOMALT=0;POPMAX_AF=0.000561

but you're using END as the position of the translocation for CHR2 citing John Marshall 's answer: https://www.biostars.org/p/386345/#386354 , this usage of END breaks the integrity of the tabix index , the variant is not found anymore with bcftools or tabix:

$ bcftools view gnomad_v2_sv.sites.vcf.gz "12:60718970-60718972" | grep gnomAD_v2_CTX_12_13
$ 
$ tabix gnomad_v2_sv.sites.vcf.gz "12:60718970-60718972" | grep gnomAD_v2_CTX_12_13
$

@jmarshall commented:

Dropping the grep from this last query highlights the unfortunate state that this corrupt index is in:

$ bcftools view gnomad_v2_sv.sites.vcf.gz "12:60718970-60718972"
[424 records on chr. 12 at positions between 62403 and 60412822.]