Improper labelling of conflicting variants as pathogenic

ManavalanG commented 7 years ago

It seems reformatted data tagged as 'May release' was produced from Mar (or prior to that?) NCBI clinvar data. Since April, NCBI clinvar seems to have changed how they report aggregated clinical significance, and this completely affects how conflicted variants are identified if current code is used with recent NCBI clinvar data. Current NCBI's practice is to use the term Conflicting interpretations of pathogenicity for conflicts, and these variants are mistakenly tagged as TRUE for pathogenic column (instead of TRUE for conflicted) as they have the string 'pathogenic'.

@XiaoleiZ - Since I am using data from your fork on Aug data, I thought I should tag you here.

@bw2 @ericminikel @XiaoleiZ - Any suggestion on which repo or fork to use for actively maintained data, if any? I would like to contribute if you need more manpower on maintaining a centralized repo.

XiaoleiZ commented 7 years ago

I have a quick solution on the script join_data.R. This will change the way we encode fields: pathogenic and benign. pathogenic=1: If the variants are reported as Pathogenic/Likely pathogenic without "uncertain significance" submission benign=1:If the variants are reported as Benign/Likely Benign without "uncertain significance" submission

ManavalanG commented 7 years ago

@XiaoleiZ Thanks for the solution. So far, I'm using the pre-formatted data available for download from this repo (and your fork) and making my changes locally in that data, instead of actually running the complete pipeline myself.

Is there any other major bug or recent problems that are not documented in Issues but would be good to be aware of?

ManavalanG commented 7 years ago

@XiaoleiZ - Looks like you meant to update Readme in your fork, but updated master repo here instead.

macarthur-lab / clinvar

Improper labelling of conflicting variants as pathogenic #40