konradjk / exac_browser

Browser for ExAC consortium data
http://exac.broadinstitute.org
MIT License
106 stars 54 forks source link

Load custom variants #350

Open silenus092 opened 5 years ago

silenus092 commented 5 years ago

Hi,

I successfully install and test ExAC browser by following instructions from readme.

However, I want to try it by loading my VCF files (custom variants) into mongodb and show them on ExAC browser as well, so I begin to inspect how the code works.

  1. I paste my vcf file and name it as ExAC*.test1.vcf.gz under an exac_data directory

  2. I use the following command to load the variants into database python manage.py load_variants_file

  3. Unfortunately , the process was not successfully loaded. I saw the error message,and it said the python code cannot pass the info tags AC_AFR , AC_AMR , AC_EAS , AC_FIN etc. Because my VCF file didn't contain those tags. I found that the ExAC_HC.0.3.vep was annotated by VariantAnnotator and VEP , also the header represents tags as follows AN_AFR ,AN_AMR ,AC_AFR , AC_AMR , AC_EAS , AC_FIN etc.

4.Then , I try running both VariantAnnotator and VEP on my vcf file,nevertheless, those info tags still did not appear.

Therefore ,could you kindly explain the process of how to load custom variants to ExAC browser, so I can reproduce and apply a process in my vcf file. I also read from https://github.com/konradjk/exac_browser/issues/268 however it is not much of help for this problem

Best regards, Bob

nawatts commented 5 years ago

Hi @silenus092,

The ExAC browser is tailored to the ExAC dataset. It was not designed to allow loading arbitrary datasets and there is no supported way to do so.

get_variants_from_sites_vcf in parsing.py is the relevant code for loading variants. You would need to modify that function based on the info fields present in your VCF file. https://github.com/konradjk/exac_browser/blob/a212465c5b75752abe8990cf6aa581295835ab58/parsing.py#L53-L147

In particular, it looks like the population specific fields are not present in your VCF. https://github.com/konradjk/exac_browser/blob/a212465c5b75752abe8990cf6aa581295835ab58/parsing.py#L120-L122 https://github.com/konradjk/exac_browser/blob/a212465c5b75752abe8990cf6aa581295835ab58/parsing.py#L129

In addition to the code for loading variants, there is code in other parts of the browser (such as the HTML templates) that assumes variants have all the fields defined in the ExAC VCF. That would also have to be modified to support the format of variants in your dataset.

silenus092 commented 5 years ago

Thank you very much , then if it'spossible to generate VCF file similar to ExAC_HC.0.3.*.vep.vcf that including all required tags (INFO) such as pop_acs,pop_ans,pop_homs ,AC_AMR ,Hemi_FIN etc. so I don't have to modify the code. how can I achieve this task? like which VEP version , external resources and command are used

nawatts commented 5 years ago

How to generate correct values for those fields (and whether or not that is possible at all) would depend on your sample data.

The VEP version used for ExAC was version 85 (http://exac.broadinstitute.org/faq)