Closed ManavalanG closed 7 years ago
I have a quick solution on the script join_data.R. This will change the way we encode fields: pathogenic and benign. pathogenic=1: If the variants are reported as Pathogenic/Likely pathogenic without "uncertain significance" submission benign=1:If the variants are reported as Benign/Likely Benign without "uncertain significance" submission
@XiaoleiZ Thanks for the solution. So far, I'm using the pre-formatted data available for download from this repo (and your fork) and making my changes locally in that data, instead of actually running the complete pipeline myself.
Is there any other major bug or recent problems that are not documented in Issues
but would be good to be aware of?
@XiaoleiZ - Looks like you meant to update Readme
in your fork, but updated master repo here instead.
It seems reformatted data tagged as 'May release' was produced from Mar (or prior to that?) NCBI clinvar data. Since April, NCBI clinvar seems to have changed how they report aggregated clinical significance, and this completely affects how
conflicted
variants are identified if current code is used with recent NCBI clinvar data. Current NCBI's practice is to use the termConflicting interpretations of pathogenicity
for conflicts, and these variants are mistakenly tagged as TRUE forpathogenic
column (instead of TRUE forconflicted
) as they have the string 'pathogenic'.@XiaoleiZ - Since I am using data from your fork on Aug data, I thought I should tag you here.
@bw2 @ericminikel @XiaoleiZ - Any suggestion on which repo or fork to use for actively maintained data, if any? I would like to contribute if you need more manpower on maintaining a centralized repo.