xf-omics / SHINE

prediction of pathogenicity for inframe indels
3 stars 0 forks source link

Processing of UKB and ClinVar indels #1

Open loodvn opened 2 years ago

loodvn commented 2 years ago

Hi there!

Really interesting work, thank you especially for open-sourcing the code. We're trying to use the pre-processed list of variants to validate a similar method, but I wanted to check how it was processed.

Could you please provide the scripts you used to generate the list of inframe indels (the files ./data/clinvar_gnomAD_esm1b_msa_{del/ins}_{1/2/3}aa.txt) from the UKBB / ClinVar raw files? Even if it's messy, it would be helpful to be able to reprocess it!

Thanks :)

xf-omics commented 2 years ago

Thanks for your interest in our work. ClinVar dataset is in well-annotated vcf. It is fairly easy to extract all the inframe_deletion and inframe_insertion recorded in the INFO column. We did not use the ClinVar annotations, because we have to make sure the variant annotations from ClinVar, UKBB and other datasets are from the same source. We re-annotated all the variants using our in-house bioinformatic pipeline, so my script will not be applicable to your data.