plagnollab / DNASeq_pipeline

Pipeline in place at the UGI for DNA level analysis
10 stars 8 forks source link

VEP annotation #21

Open pontikos opened 9 years ago

pontikos commented 9 years ago

Use build 38.

Add ExAC annotation allele frequencies: /cluster/project8/vyp/AdamLevine/annotations/ExAC/0.3/ExAC.r0.3.sites.vep.vcf.gz Have a look at this script to prepare allele frequency from INFO: /cluster/project8/vyp/AdamLevine/annotations/esp/prepare_esp.sh

Add 1kg allele freq.

Add CADD scores

pontikos commented 9 years ago

I've committed a couple of scripts in annotation to prepare the ExAC custom annotation for VEP: a668095f80ac1556e4deddf9b6c5d5e1aac0f0e4 and 65c5d977c5f407d63c37e9979c3accb35fd9dc8a

pontikos commented 9 years ago

I've written a generic liftover script to prepare the custom annotation for VEP. The data needs to be split by chromosome in order for it to work. Note that it doesn't deal with positions moved to a different chromosome or with alternative sequences. These just get dropped by filtering the bed file on the chromosome name.

defa3a540c6238ddf9ca7c0dc366111e2957a811

APLevine commented 9 years ago

Sounds good.

Adam

Adam P. Levine On 25 Jan 2015 23:42, "Nikolas Pontikos" notifications@github.com wrote:

I've written a generic liftover script to prepare the custom annotation for VEP. The data needs to be split by chromosome in order for it to work. Note that it doesn't deal with positions moved to a different chromosome or with alternative sequences. These just get dropped by filtering the bed file on the chromosome name.

defa3a5 https://github.com/vplagnol/pipelines/commit/defa3a540c6238ddf9ca7c0dc366111e2957a811

— Reply to this email directly or view it on GitHub https://github.com/vplagnol/pipelines/issues/21#issuecomment-71400308.

pontikos commented 9 years ago

Thanks.

I'm moving the annotations to /cluster/project8/IBDAJE/VEP_custom_annotations Initially I had moved them to /goon2/scratch2/vyp-scratch2/annotation but I found writing to that location to be quite unreliable using the SGE (sometimes output files were empty).

I have updated the custom annotations in run_VEP.sh 2d5003481b5ca58dabad50a4693fcb80be770f7f to read from a genome build dependent location

APLevine commented 9 years ago

Excellent. So now the post-VEP script just has to be finished. Let me know if you want to discuss what this is going to do.

Adam

Adam P. Levine On 25 Jan 2015 23:52, "Nikolas Pontikos" notifications@github.com wrote:

Thanks.

I'm moving the annotations to /cluster/project8/IBDAJE/VEP_custom_annotations Initially I had moved them to /goon2/scratch2/vyp-scratch2/annotation but I found writing to that location to be quite unreliable using the SGE (sometimes output files were empty).

I have updated the custom annotations in run_VEP.sh 2d50034 https://github.com/vplagnol/pipelines/commit/2d5003481b5ca58dabad50a4693fcb80be770f7f to read from a genome build dependent location

— Reply to this email directly or view it on GitHub https://github.com/vplagnol/pipelines/issues/21#issuecomment-71400702.

pontikos commented 9 years ago

Will do. I'm noticing some strange things occurring on the filesystem though: sometimes the output of bgzip is empty. I don't think it's due to a bug in my script but instead probably because of some NFS lag. I'm looking into it.

pontikos commented 9 years ago

Ok I think I know what it is: I had a chr${ch}*.vcf.gz instead of chr${ch}_*.vcf.gz so there must have been some concurrency issues because the same file would be matched twice. Correcting that and running again. I've committed the fix 60aae875aa06d62b4f94d45a34da184111fffe3c

pontikos commented 9 years ago

See #25 , the liftover of the annoations form 37 to 38 is not trivial because certain regions (especially around the centromeres) have changed significantly.

APLevine commented 9 years ago

I wonder how the ESP annotation in b38 works with the VEP?

Adam

Adam P. Levine On 29 Jan 2015 15:08, "Nikolas Pontikos" notifications@github.com wrote:

See #25 https://github.com/vplagnol/pipelines/issues/25 , the liftover of the annoations form 37 to 38 is not trivial because certain regions (especially around the centromeres) have changed significantly.

— Reply to this email directly or view it on GitHub https://github.com/vplagnol/pipelines/issues/21#issuecomment-72039901.

pontikos commented 9 years ago

Good point, yes I am not sure how the built-in ESP annotation works compared to the custom one which we liftover. I will check.