smshuai / DriverPower

DriverPower
GNU General Public License v3.0
26 stars 4 forks source link

Support for HG38? #14

Open Slavatron opened 4 years ago

Slavatron commented 4 years ago

Just wondering if there are any plans to develop support for HG38?

smshuai commented 4 years ago

Hi Slavatron,

Thank you for your interest. It's on my list. The bottleneck is to collect all genomic features and test them in hg38. ENCODE does have hg38 tracks so this should be possible, but cancer WGS mutations I have are from PCAWG, which is hg19-based. I will need to identify a couple of hg38 cancer WGS datasets for testing as well.

Best, Shimin

cboursnell commented 4 years ago

I know it's not perfect, but can you do a liftOver on the coordinates from hg19 to make preliminary versions of hg38 coordinates?

alhafidzhamdan commented 4 years ago

Hi there, is there any updates regarding hg38 support? Much appreciated! we all are eager to use your tool!

smshuai commented 4 years ago

Hi @cboursnell, that's something I can do pretty quickly. I will do it as an intermediate solution. Thanks for the suggestion!

Hi @alhafidzhamdan, I will provide a genome lift version for hg38 soon!

DarioS commented 4 years ago

To use functional information, one or more types of functional measurements (e.g., CADD, EIGEN, LINSIGHT etc) need to be collected first. The CADD scores can be retrieved via its web interface (up tp 100K variants each time) without downloading the large file for all possible SNVs (~80 G). If you have more than 100K variants, you can either split your file and run the web app multiple times, or download the large file and try tabix. Other scores can be obtained using a similar method after download.

It seems like the method requires lots of manual intervention and visiting different web applications, which doesn't make it suitable for a pipeline for a large amount of samples.