quinlan-lab / ccrhtml

A small repo for storing the code for making the files and html for CCRs.
22 stars 5 forks source link

Reference build? #12

Closed dantaki closed 5 years ago

dantaki commented 5 years ago

Is this hg19/GRCH37? If so will there be a file for hg38/GRCh38. And if this hg38 will there be hg19? Would lifting over the positions suffice? Thanks and great work!

jimhavrilla commented 5 years ago

Yes, this is hg19/GRCh37. There will likely be a version for GRCh38 in the future, when gnomAD updates to GRCh38 (which should be relatively soon, I believe). You could use UCSC liftOver, I don't have any personal plans to use it since the conversion is not accurate enough yet. My coworker has found and used a lifted over version of gnomAD from Ensembl which is at http://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/gnomad.genomes.r2.0.1.sites.GRCh38.noVEP.vcf.gz but in his experience at best 70-80% of SNPs are lifted over accurately, not to mention INDELs. That concern for me is twofold, one being that the absence of a few SNPs could lead to many false positive CCRs at the high percentile if you use that unofficial version of gnomAD to create the regions. The other reason is that if the breakpoints of CCRs are messed up from lifting over the hg19 regions themselves, you could be missing pathogenic variants or again calling regions constrained that are in fact not.

jimhavrilla commented 5 years ago

I also would like to say thanks for this question because it could be clearer on the site which reference it is using, and now thanks to this issue, it is.

dantaki commented 5 years ago

Thanks for the response and I'm looking forward for a hg38 annotation :+1:

jimhavrilla commented 5 years ago

Sure thing.