varfish-org / varfish-server

VarFish: comprehensive DNA variant analysis for diagnostics and research
MIT License
43 stars 11 forks source link

Inconsistent ensembl info display #156

Closed eudesbarbosa closed 2 years ago

eudesbarbosa commented 3 years ago

Issue Internally VarFish displays gene/transcript information associated with GRCh37, but it displays results from GRCh38 while querying Ensembl. This mix leads to confusion.

Example Gene PPP2R2C (chr4:6,349,605) The link provided by VarFish refers to GRCh38 : http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000074211;r=4:6320578-6563600 ...but displayed internally GRCh37: http://grch37.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000074211;r=4:6322305-6565327

Expected behaviour Consistent information displayed internally and queried from external sources, i.e., same genome version.

Additional info Issue also impacts VariantValidator: https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg19/chr4-6349605-C-T/all?content-type=application%2Fjson https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/chr4-6347878-C-T/all?content-type=application%2Fjson

eudesbarbosa commented 3 years ago

Genomic variant GRCh37:4:6349605:C:T.

Transcript NM_020416.4 is missing in VarFish. Variant Validator returns nonsensical output.

stolpeo commented 2 years ago

Root Cause Analysis

We do two link-outs to Ensembl, one for the variant and one for the gene. As it was not specified which link-out was meant, I checked both:

Linkout for the variant

image

This links correctly to the GRCh37 genome.

Linkout for gene in fold-out

Ensembl ID in the gene card links wrongly to the GRCh38.

Resolution Proposal

Linkout to ensembl gene needs to be changed.

Affected Components

Affected Modules/Files

Required Architectural Changes

None.

Resolution Sketch

www.ensembl.org to grch37.ensembl.org

stolpeo commented 2 years ago

@eudesbarbosa I fixed the broken link-out (at least to my understanding). I couldn't follow the issue with variantvalidator. this sounds like an separate issue. In this case, could you please open a new bug report for this?

eudesbarbosa commented 2 years ago

@stolpeo, I scan the merge request and just to clarify, is the used link depending somehow on the Genomebuild info? I'd assume that we have already samples processed with GRCh38 and that could lead to problems for those.

Regarding the VariantValidator, I will write a different issue.

stolpeo commented 2 years ago

@eudesbarbosa Yes, the link is based on genomebuild. Right now, the GRCh38 development branch is not yet merged into the main branch. I would prefer to have the link switch in the GRCh38 branch rather than integrating GRCh38 related stuff before introduction of the actual feature.

eudesbarbosa commented 2 years ago

Thank you for the clarification.

Regarding VariantValidator, I could not replicate the error. It seems the transcripts displayed in the query is that same as the ones displayed in the 'Transcript information' table. Plus, the warning displayed can be ignored - according to their website: RefSeqGene record not available: Only some genes have an NCBI RefSeqGene record. This warning simply indicates that no RefSeqGene record exists for the gene in which the variant that is being validated resides.

stolpeo commented 2 years ago

I changed my mind and included the switch anyway after revising @holtgrewe MR #220 and not finding a change in the according lines. So this commit should make the ensembl link-out future proof.