Closed ifokkema closed 2 weeks ago
Checking this in the latest development version (in production not live) I see
import json
import VariantValidator
vval = VariantValidator.Validator()
variant = 'NC_000021.8:g.46924426_46924427del'
genome_build = 'GRCh37'
select_transcripts = 'NM_030582.3'
validate = vval.validate(variant, genome_build, select_transcripts)
validation = validate.format_as_dict(with_meta=True)
print(json.dumps(validation, sort_keys=True, indent=4, separators=(',', ': ')))
{
"NM_030582.3:c.3363_3364insCCCCCCA": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1122Pfs*144)",
"tlr": "NP_085059.2:p.(Gly1122ProfsTer144)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104329_104330insCCCCCCA",
"hgvs_transcript_variant": "NM_030582.3:c.3363_3364insCCCCCCA",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "46924425",
"ref": "CGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "45504511",
"ref": "CGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "46924425",
"ref": "CGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "45504511",
"ref": "CGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924426_46924427del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.3 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
and
import json
import VariantValidator
vval = VariantValidator.Validator()
variant = 'NC_000021.8:g.46924426_46924427del'
genome_build = 'GRCh37'
select_transcripts = 'NM_030582.4'
validate = vval.validate(variant, genome_build, select_transcripts)
validation = validate.format_as_dict(with_meta=True)
print(json.dumps(validation, sort_keys=True, indent=4, separators=(',', ': ')))
{
"NM_030582.4:c.3364_3365del": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121747_121748del",
"vcf": {
"alt": "C",
"chr": "HG2521_PATCH",
"pos": "121746",
"ref": "CGG"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121747_121748del",
"vcf": {
"alt": "C",
"chr": "NW_025791815.1",
"pos": "121746",
"ref": "CGG"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1122Pfs*141)",
"tlr": "NP_085059.2:p.(Gly1122ProfsTer141)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104330_104331del",
"hgvs_transcript_variant": "NM_030582.4:c.3364_3365del",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924436del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "46924425",
"ref": "CGGCCCCCCAGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504522del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "45504511",
"ref": "CGGCCCCCCAGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924436del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "46924425",
"ref": "CGGCCCCCCAGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504522del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "45504511",
"ref": "CGGCCCCCCAGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924426_46924427del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
Whic supports what @ifokkema is saying. Looking at this now
Mutalyzer correctly maps this genomic variant to NM_030582.4:c.3363_3364insCCCCCCA
This is interesting. When did Mutalyzer start handling gaps? They were very against it
OK, thre first issue is the alignment for .4 is different to .3
When uncorrected
NC_000021.8:g.46924426_46924427= > NM_030582.3:c.3363+1_3363+2GG= and NM_030582.4:c.3364_3365GG=
@John-F-Wagstaff, I understand we get all our alignments direct from RefSeq so this will be correct?
@ifokkema NM_030582.4:c.3363_3364insCCCCCCA will not be the correct output if this is the case. But I will find out where the gap should be and figure out why VV is mapping incorrectly back to GRCh37 for this variant
It looks like the .4 alignment has been moved to around here
NC_000021.8:g.46924440_46924441= > NM_030582.4:c.3377+1_3377+2CC=
So for .3 we have
NC_000021.8:g.46924426_46924434= > NM_030582.3:c.3363+1_3364-1GGCCCCCCA=
CTGCCCGGCCCCCCCGGCCCCCCAGGCCCCCCAGGCCCA (genomic sequence)
CTGCCCGGCCCCCCC GGCCCCCCAGGCCCA (the NM sequence)
For .4, NC_000021.8:g.46924440_46924448= > NM_030582.4:c.3377+1_3378-1CCCAGGCCC=
Giving
CTGCCCGGCCCCCCCGGCCCCCCAGGCCCCCCAGGCCCA (genomic sequence)
CTGCCCGGCCCCCCCGGCCCCCCAGGCCC A (the NM sequence)
So the outcome of NC_000021.8:g.46924426_46924427del is correctly stated as NM_030582.4:c.3364_3365del but should map back to NC_000021.8:g.46924426_46924427del
We also need to check how g.46924444_46924445delGG is handled
Hello Pete, yes we get our alignments from RefSeq
The actual data for both used for mapping (in the order transcript ac, genomic ac, strand, alignment type, genomic start, genomic stop are:
NM_030582.3 | NC_000021.8 | 1 | splign | 46875423 | 46933634 | 667=12065N545=5108N87=1494N60=815N130=930N77=262N216=361N27=1548N63=105N87=310N54=179N159=1104N63=769N27=4039N132=454N63=908N27=1041N36=753N72=489N36=353N90=1220N30=122N27=449N63=275N75=969N27=290N54=442N75=1067N69=481N43=506N63=6349N44=361N96=9I36=577N145=79N74=406N129=1594N33=1762N246=462N198=849N116=961N1533=
NM_030582.3 | NC_000021.9 | 1 | splign | 45455509 | 45513720 | 667=12065N545=5108N87=1494N60=815N130=930N77=262N216=361N27=1548N63=105N87=310N54=179N159=1104N63=769N27=4039N132=454N63=908N27=1041N36=753N72=489N36=353N90=1220N30=122N27=449N63=275N75=969N27=290N54=442N75=1067N69=481N43=506N63=6349N44=361N96=9I36=577N145=79N74=406N129=1594N33=1762N246=462N198=849N116=961N1533=
for the older version, and :
NM_030582.4 | NC_000021.8 | 1 | splign | 46875435 | 46933634 | 655=12065N545=5108N87=1494N60=815N130=930N77=262N216=361N27=1548N63=105N87=310N54=179N159=1104N63=769N27=4039N132=454N63=908N27=1041N36=753N72=489N36=353N90=1220N30=122N27=449N63=275N75=969N27=290N54=442N75=1067N69=481N43=506N63=6349N44=361N110=9I22=577N145=79N74=406N129=1594N33=1762N246=462N198=849N116=961N1533=
NM_030582.4 | NC_000021.9 | 1 | splign | 45455521 | 45513720 | 655=12065N545=5108N87=1494N60=815N130=930N77=262N216=361N27=1548N63=105N87=310N54=179N159=1104N63=769N27=4039N132=454N63=908N27=1041N36=753N72=489N36=353N90=1220N30=122N27=449N63=275N75=969N27=290N54=442N75=1067N69=481N43=506N63=6349N44=361N110=9I22=577N145=79N74=406N129=1594N33=1762N246=462N198=849N116=961N1533=
for the newer.
The alignment is mostly the same but 12bp got trimmed of the start of the newer sequence and we move the gap a bit, the relevant section for NM_030582.3 is 96=9I36=
and for NM_030582.4 is 110=9I22=
Doing some more digging the 96=9I36=
version of the alignment CIGAR is the old version of the alignment. We should have updated it to 110=9I22=
for NM_030582. 2, 3, and 4 l when we did the database update for 2024_01. This is when RefSeq produced a new set of alignments for the older sequences, and we set it to overwrite when it encountered differing older alignment versions, somehow it did not for this one.
This means that we should always produce the NM_030582.4 version of the current output, even for NM_030582.3, when we get the newer version loaded to the database. So unless we have some kind of interaction between the two we need to make sure that the NM_030582.4 version is correct.
This is particularly true because RefSeq have just (2024_08_27) released a new version of the alignments and associated annotation, which means that it is time to make another VVTA DB build again. I will get the issue with alignments not being overwritten fixed before release.
Mutalyzer correctly maps this genomic variant to NM_030582.4:c.3363_3364insCCCCCCA
This is interesting. When did Mutalyzer start handling gaps? They were very against it
Yeah, I'm not sure, actually. They recently (but when?) added quite a few improvements to both the web interface and the APIs, so that now they can actually map variants from the genome to the transcript. It still requires more work (an additional click on the web interface or two API calls in total), but at least it's possible. However, I'm not sure whether users sufficiently understand the difference between their NC(NM) and NM notations.
NC_000021.8:g.46924426_46924434= > NM_030582.3:c.3363+1_3364-1GGCCCCCCA= NC_000021.8:g.46924440_46924441= > NM_030582.4:c.3377+1_3377+2CC=
Forgive my ignorance, but neither transcript has an intron in that area. Are these internal descriptions, or so?
CTGCCCGGCCCCCCCGGCCCCCCAGGCCCCCCAGGCCCA (genomic sequence) CTGCCCGGCCCCCCC GGCCCCCCAGGCCCA (the NM sequence)
CTGCCCGGCCCCCCCGGCCCCCCAGGCCCCCCAGGCCCA (genomic sequence) CTGCCCGGCCCCCCCGGCCCCCCAGGCCC A (the NM sequence)
These alignments are the same, though... I mean, surely, in the .4 transcript, it's shifted to 3', but it's the same sequence difference; it just shifted. So, when applying the variant to the genome and then mapping it to the transcripts, the variant and the alignment gap are overlappable in both cases. I'd argue this means the data is actually no different, and the variant on the transcript level should not differ between these versions.
@ifokkema The gaps are not introns in this case, they are "fake introns" caused by alignment gaps, and the VV code compensates for them during the analysis process. :)
hese alignments are the same, though... I mean, surely, in the .4 transcript, it's shifted to 3', but it's the same sequence difference; it just shifted. So, when applying the variant to the genome and then mapping it to the transcripts, the variant and the alignment gap are overlappable in both cases. I'd argue this means the data is actually no different, and the variant on the transcript level should not differ between these versions.
I agree that the gaps are the same so I need to look at why the .4 version is not mapping the gap. I corrected the mishandled c_to_g, but this is a good point. Looking at it more now. It is not always obvious why this does not work
OK @ifokkema and @John-F-Wagstaff. I have tried to make the gap behave the way that Ivo suggested for the .4 but is does not seem to work with the altered alignment.
This is saving progress and reporting back because this is a very tricky problem and is giving huge headaches
For the original variant in this post NC_000021.8:g.46924426_46924427del we get the ins in the .3 c. and back to the correct g. but the ins does not easily map onto the .4 with the different alignment. Yes, the gap does move, but it does not merge well to give the output @ifokkema wanted to see.
{
"NM_030582.3:c.3363_3364insCCCCCCA": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1122Pfs*144)",
"tlr": "NP_085059.2:p.(Gly1122ProfsTer144)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104329_104330insCCCCCCA",
"hgvs_transcript_variant": "NM_030582.3:c.3363_3364insCCCCCCA",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "46924425",
"ref": "CGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "45504511",
"ref": "CGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "46924425",
"ref": "CGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "45504511",
"ref": "CGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924426_46924427del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.3 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"NM_030582.4:c.3364_3365del": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121747_121748del",
"vcf": {
"alt": "C",
"chr": "HG2521_PATCH",
"pos": "121746",
"ref": "CGG"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121747_121748del",
"vcf": {
"alt": "C",
"chr": "NW_025791815.1",
"pos": "121746",
"ref": "CGG"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1122Pfs*141)",
"tlr": "NP_085059.2:p.(Gly1122ProfsTer141)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104330_104331del",
"hgvs_transcript_variant": "NM_030582.4:c.3364_3365del",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "46924425",
"ref": "CGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "45504511",
"ref": "CGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "46924425",
"ref": "CGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "45504511",
"ref": "CGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924426_46924427del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
If we consider NC_000021.8:g.46924444_46924445delGG which is the GG that maps directly to the repositioned gap we get an insertion in both the .3 (c.3377_3378insCCCACCC rather than 3363_3364insCCCCCCA as before) and a correctly positioned insertion into the .4 (NM_030582.4:c.3377_3378insCCCACCC) but struggle currently to get the .3 back to the correct genomic (I think I may know how to fix that though)
{
"NM_030582.3:c.3377_3378insCCCACCC": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(R1127Pfs*139)",
"tlr": "NP_085059.2:p.(Arg1127ProfsTer139)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104343_104344insCCCACCC",
"hgvs_transcript_variant": "NM_030582.3:c.3377_3378insCCCACCC",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924448_46924449insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "21",
"pos": "46924445",
"ref": "G"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504534_45504535insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "21",
"pos": "45504531",
"ref": "G"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924448_46924449insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "chr21",
"pos": "46924445",
"ref": "G"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504534_45504535insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "chr21",
"pos": "45504531",
"ref": "G"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924444_46924445delGG",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Removing redundant reference bases from variant description",
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"NM_030582.4:c.3377_3378insCCCACCC": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121760_121761insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "HG2521_PATCH",
"pos": "121757",
"ref": "G"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121760_121761insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "NW_025791815.1",
"pos": "121757",
"ref": "G"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(R1127Pfs*139)",
"tlr": "NP_085059.2:p.(Arg1127ProfsTer139)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104343_104344insCCCACCC",
"hgvs_transcript_variant": "NM_030582.4:c.3377_3378insCCCACCC",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "46924443",
"ref": "AGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "45504529",
"ref": "AGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "46924443",
"ref": "AGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "45504529",
"ref": "AGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924444_46924445delGG",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Removing redundant reference bases from variant description",
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.4 contains 9 fewer bases between c.3377_3378 than NC_000021.8"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
If we look at the middle GG NC_000021.8:g.46924435_46924436delGG we get the ins at NM_030582.3:c.3377_3378insCCCACCC and NM_030582.4:c.3377_3378insCCCACCC which is the same position as for the NC_000021.8:g.46924444_46924445delGG input. Indeed the .4 maps the genomic variant back to NC_000021.8:g.46924444_46924445del for this input. As before the remapping of the .3 is still broken. I will focus on fixing this first. The take home message though is that this is very tricky sequence and it is not totally clear exactly what the outputs should be.
{
"NM_030582.3:c.3377_3378insCCCACCC": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(R1127Pfs*139)",
"tlr": "NP_085059.2:p.(Arg1127ProfsTer139)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104343_104344insCCCACCC",
"hgvs_transcript_variant": "NM_030582.3:c.3377_3378insCCCACCC",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924448_46924449insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "21",
"pos": "46924445",
"ref": "G"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504534_45504535insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "21",
"pos": "45504531",
"ref": "G"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924448_46924449insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "chr21",
"pos": "46924445",
"ref": "G"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504534_45504535insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "chr21",
"pos": "45504531",
"ref": "G"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924444_46924445delGG",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Removing redundant reference bases from variant description",
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"NM_030582.4:c.3377_3378insCCCACCC": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121760_121761insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "HG2521_PATCH",
"pos": "121757",
"ref": "G"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121760_121761insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "NW_025791815.1",
"pos": "121757",
"ref": "G"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(R1127Pfs*139)",
"tlr": "NP_085059.2:p.(Arg1127ProfsTer139)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104343_104344insCCCACCC",
"hgvs_transcript_variant": "NM_030582.4:c.3377_3378insCCCACCC",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "46924443",
"ref": "AGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "45504529",
"ref": "AGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "46924443",
"ref": "AGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "45504529",
"ref": "AGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924444_46924445delGG",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Removing redundant reference bases from variant description",
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.4 contains 9 fewer bases between c.3377_3378 than NC_000021.8"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
OK a little more progress
for NC_000021.8:g.46924426_46924427del we can now achieve @ifokkema 's request and still get back to the correct genomic. Phew.
{
"NM_030582.3:c.3363_3364insCCCCCCA": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1122Pfs*144)",
"tlr": "NP_085059.2:p.(Gly1122ProfsTer144)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104329_104330insCCCCCCA",
"hgvs_transcript_variant": "NM_030582.3:c.3363_3364insCCCCCCA",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "46924425",
"ref": "CGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "45504511",
"ref": "CGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "46924425",
"ref": "CGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "45504511",
"ref": "CGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924426_46924427del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"NM_030582.4:c.3363_3364insCCCCCCA": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121746_121747insCCCCCCA",
"vcf": {
"alt": "CCCCCCCA",
"chr": "HG2521_PATCH",
"pos": "121746",
"ref": "C"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121746_121747insCCCCCCA",
"vcf": {
"alt": "CCCCCCCA",
"chr": "NW_025791815.1",
"pos": "121746",
"ref": "C"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1122Pfs*144)",
"tlr": "NP_085059.2:p.(Gly1122ProfsTer144)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104329_104330insCCCCCCA",
"hgvs_transcript_variant": "NM_030582.4:c.3363_3364insCCCCCCA",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "46924425",
"ref": "CGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "45504511",
"ref": "CGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "46924425",
"ref": "CGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "45504511",
"ref": "CGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924426_46924427del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.4 contains 9 fewer bases between c.3377_3378 than NC_000021.8"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
For NC_000021.8:g.46924435_46924436delGG We get NM_030582.3:c.3366_3372dup and NM_030582.4:c.3366_3372dup and map back to the correct position on the genome. To me a dup makes sense in this case rather than an ins, but could use a sanity check @ifokkema @John-F-Wagstaff @leicray
{
"NM_030582.3:c.3366_3372dup": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1125Pfs*141)",
"tlr": "NP_085059.2:p.(Gly1125ProfsTer141)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104332_104338dup",
"hgvs_transcript_variant": "NM_030582.3:c.3366_3372dup",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924435_46924436del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "46924434",
"ref": "AGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504521_45504522del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "45504520",
"ref": "AGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924435_46924436del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "46924434",
"ref": "AGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504521_45504522del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "45504520",
"ref": "AGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924435_46924436delGG",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Removing redundant reference bases from variant description",
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"NM_030582.4:c.3366_3372dup": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121749_121755dup",
"vcf": {
"alt": "GCCCCCCA",
"chr": "HG2521_PATCH",
"pos": "121748",
"ref": "G"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121749_121755dup",
"vcf": {
"alt": "GCCCCCCA",
"chr": "NW_025791815.1",
"pos": "121748",
"ref": "G"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1125Pfs*141)",
"tlr": "NP_085059.2:p.(Gly1125ProfsTer141)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104332_104338dup",
"hgvs_transcript_variant": "NM_030582.4:c.3366_3372dup",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924435_46924436del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "46924434",
"ref": "AGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504521_45504522del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "45504520",
"ref": "AGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924435_46924436del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "46924434",
"ref": "AGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504521_45504522del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "45504520",
"ref": "AGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924435_46924436delGG",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Removing redundant reference bases from variant description",
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.4 contains 9 fewer bases between c.3377_3378 than NC_000021.8"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
for NC_000021.8:g.46924444_46924445delGG we get NM_030582.4:c.3377_3378insCCCACCC and NM_030582.3:c.3377_3378insCCCACCCand again map back to the correct g. positions for the .4 version but sadly not the .3, so not quite there but now very clode I feel
{
"NM_030582.3:c.3377_3378insCCCACCC": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(R1127Pfs*139)",
"tlr": "NP_085059.2:p.(Arg1127ProfsTer139)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104343_104344insCCCACCC",
"hgvs_transcript_variant": "NM_030582.3:c.3377_3378insCCCACCC",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924448_46924449insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "21",
"pos": "46924445",
"ref": "G"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504534_45504535insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "21",
"pos": "45504531",
"ref": "G"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924448_46924449insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "chr21",
"pos": "46924445",
"ref": "G"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504534_45504535insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "chr21",
"pos": "45504531",
"ref": "G"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924444_46924445delGG",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Removing redundant reference bases from variant description",
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"NM_030582.4:c.3377_3378insCCCACCC": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121760_121761insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "HG2521_PATCH",
"pos": "121757",
"ref": "G"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121760_121761insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "NW_025791815.1",
"pos": "121757",
"ref": "G"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(R1127Pfs*139)",
"tlr": "NP_085059.2:p.(Arg1127ProfsTer139)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104343_104344insCCCACCC",
"hgvs_transcript_variant": "NM_030582.4:c.3377_3378insCCCACCC",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "46924443",
"ref": "AGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "45504529",
"ref": "AGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "46924443",
"ref": "AGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "45504529",
"ref": "AGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924444_46924445delGG",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Removing redundant reference bases from variant description",
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.4 contains 9 fewer bases between c.3377_3378 than NC_000021.8"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
Note: This broke 1 pre-existign test so I need to fix that as well. Check it for accuracy
OK, all now fully working
{
"NM_030582.3:c.3377_3378insCCCACCC": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(R1127Pfs*139)",
"tlr": "NP_085059.2:p.(Arg1127ProfsTer139)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104343_104344insCCCACCC",
"hgvs_transcript_variant": "NM_030582.3:c.3377_3378insCCCACCC",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "46924443",
"ref": "AGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "45504529",
"ref": "AGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "46924443",
"ref": "AGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "45504529",
"ref": "AGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924444_46924445delGG",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Removing redundant reference bases from variant description",
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"NM_030582.4:c.3377_3378insCCCACCC": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121760_121761insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "HG2521_PATCH",
"pos": "121757",
"ref": "G"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121760_121761insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "NW_025791815.1",
"pos": "121757",
"ref": "G"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(R1127Pfs*139)",
"tlr": "NP_085059.2:p.(Arg1127ProfsTer139)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104343_104344insCCCACCC",
"hgvs_transcript_variant": "NM_030582.4:c.3377_3378insCCCACCC",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "46924443",
"ref": "AGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "45504529",
"ref": "AGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "46924443",
"ref": "AGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "45504529",
"ref": "AGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924444_46924445delGG",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Removing redundant reference bases from variant description",
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.4 contains 9 fewer bases between c.3377_3378 than NC_000021.8"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
2 previous tests now break and need to be investigated, but so far so very good :)
@Peter-J-Freeman
OK @ifokkema and @John-F-Wagstaff. I have tried to make the gap behave the way that Ivo suggested for the .4 but is does not seem to work with the altered alignment.
This is saving progress and reporting back because this is a very tricky problem and is giving huge headaches
I'm sorry about the headaches :(
For the original variant in this post NC_000021.8:g.46924426_46924427del we get the ins in the .3 c. and back to the correct g. but the ins does not easily map onto the .4 with the different alignment. Yes, the gap does move, but it does not merge well to give the output @ifokkema wanted to see.
{ "NM_030582.3:c.3363_3364insCCCCCCA": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del" } } }, "NM_030582.4:c.3364_3365del": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del" } } } }
The first one looks good; the second variant does not. That should be NC_000021.8:g.46924426_46924436del
.
If we consider NC_000021.8:g.46924444_46924445delGG which is the GG that maps directly to the repositioned gap we get an insertion in both the .3 (c.3377_3378insCCCACCC rather than 3363_3364insCCCCCCA as before) and a correctly positioned insertion into the .4 (NM_030582.4:c.3377_3378insCCCACCC) but struggle currently to get the .3 back to the correct genomic (I think I may know how to fix that though)
c.3377_3378insCCCACCC
makes sense for NC_000021.8:g.46924444_46924445delGG
.
{ "NM_030582.3:c.3377_3378insCCCACCC": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924448_46924449insCCCACCC" } } }, "NM_030582.4:c.3377_3378insCCCACCC": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del" } } } }
So the first one is incorrect, and the second one is correct.
If we look at the middle GG NC_000021.8:g.46924435_46924436delGG we get the ins at NM_030582.3:c.3377_3378insCCCACCC and NM_030582.4:c.3377_3378insCCCACCC which is the same position as for the NC_000021.8:g.46924444_46924445delGG input.
No; NC_000021.8:g.46924435_46924436delGG
leads to c.3366_3372dup
.
NC_000021.8:g.46924435_46924436delGG
CTGCCCGGCCCCCCC GG CCCCCCAGG CCCCCCAGGCCCA (genomic sequence)
CTGCCCGGCCCCCCC GG CCCCCCA-- CCCCCCAGGCCCA (genomic variant)
CTGCCCGGCCCCCCC GG CCCCCCAGGCCCA (the NM sequence)
CTGCCCGGCCCCCCC GG CCCCCCA CCCCCCAGGCCCA (the NM variant, raw)
CTGCCCGGCCCCCCC GG CCCCCCAGGCCCA (the NM variant, normalized)
< dup >
=c.3366_3372dup
Indeed the .4 maps the genomic variant back to NC_000021.8:g.46924444_46924445del for this input. As before the remapping of the .3 is still broken. I will focus on fixing this first. The take home message though is that this is very tricky sequence and it is not totally clear exactly what the outputs should be.
It's definitely a tricky sequence, but I believe the output isn't ambiguous.
{ "NM_030582.3:c.3377_3378insCCCACCC": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924448_46924449insCCCACCC" } } }, "NM_030582.4:c.3377_3378insCCCACCC": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del" } } } }
The first one is incorrect, the second one is correct.
OK a little more progress
for NC_000021.8:g.46924426_46924427del we can now achieve @ifokkema 's request and still get back to the correct genomic. Phew.
{ "NM_030582.3:c.3363_3364insCCCCCCA": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del" } } }, "NM_030582.4:c.3363_3364insCCCCCCA": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del" } } } }
Cool :smile:
For NC_000021.8:g.46924435_46924436delGG We get NM_030582.3:c.3366_3372dup and NM_030582.4:c.3366_3372dup and map back to the correct position on the genome. To me a dup makes sense in this case rather than an ins, but could use a sanity check @ifokkema @John-F-Wagstaff @leicray
Yes, this makes sense.
{ "NM_030582.3:c.3366_3372dup": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924435_46924436del" } } }, "NM_030582.4:c.3366_3372dup": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924435_46924436del" } } } }
Both correct :smile:
for NC_000021.8:g.46924444_46924445delGG we get NM_030582.4:c.3377_3378insCCCACCC and NM_030582.3:c.3377_3378insCCCACCCand again map back to the correct g. positions for the .4 version but sadly not the .3, so not quite there but now very clode I feel
{ "NM_030582.3:c.3377_3378insCCCACCC": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924448_46924449insCCCACCC" } } }, "NM_030582.4:c.3377_3378insCCCACCC": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del" } } } }
The first one is incorrect; the second one is correct.
OK, all now fully working
{ "NM_030582.3:c.3377_3378insCCCACCC": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del" } } }, "NM_030582.4:c.3377_3378insCCCACCC": { "primary_assembly_loci": { "grch37": { "hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del" } } } }
Yes, these two are now also correct.
So, a final list:
NC_000021.8:g.46924426_46924427del
== c.3363_3364insCCCCCCA
NC_000021.8:g.46924426_46924436del
== c.3364_3365del
NC_000021.8:g.46924435_46924436del
== c.3366_3372dup
NC_000021.8:g.46924444_46924445del
== c.3377_3378insCCCACCC
I'm not sure if they're all functional, as I have seen some incorrect outputs and I'm not sure if they're all fixed now.
@ifokkema. thanks for the final list of variants. I will run these and post the results.
The server will not be correct yet because this is still in dev, but as soon as all is corrected and I fix the outstanding broken tests we will get it deployed.
{
"NM_030582.3:c.3363_3364insCCCCCCA": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1122Pfs*144)",
"tlr": "NP_085059.2:p.(Gly1122ProfsTer144)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104329_104330insCCCCCCA",
"hgvs_transcript_variant": "NM_030582.3:c.3363_3364insCCCCCCA",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "46924425",
"ref": "CGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "45504511",
"ref": "CGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "46924425",
"ref": "CGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "45504511",
"ref": "CGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924426_46924427del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"NM_030582.4:c.3363_3364insCCCCCCA": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121746_121747insCCCCCCA",
"vcf": {
"alt": "CCCCCCCA",
"chr": "HG2521_PATCH",
"pos": "121746",
"ref": "C"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121746_121747insCCCCCCA",
"vcf": {
"alt": "CCCCCCCA",
"chr": "NW_025791815.1",
"pos": "121746",
"ref": "C"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1122Pfs*144)",
"tlr": "NP_085059.2:p.(Gly1122ProfsTer144)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104329_104330insCCCCCCA",
"hgvs_transcript_variant": "NM_030582.4:c.3363_3364insCCCCCCA",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "46924425",
"ref": "CGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "45504511",
"ref": "CGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "46924425",
"ref": "CGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "45504511",
"ref": "CGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924426_46924427del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.4 contains 9 fewer bases between c.3377_3378 than NC_000021.8"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
Another similar test seems to be broken here as well, so this is the next thing to investigate. I suspect a gap-merge when mapping g_to_t is being missed by the code
{
"NM_030582.3:c.3363_3364insCCCCCCA": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1122Pfs*144)",
"tlr": "NP_085059.2:p.(Gly1122ProfsTer144)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104329_104330insCCCCCCA",
"hgvs_transcript_variant": "NM_030582.3:c.3363_3364insCCCCCCA",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "46924425",
"ref": "CGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "45504511",
"ref": "CGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924427del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "46924425",
"ref": "CGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504513del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "45504511",
"ref": "CGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924426_46924436del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.3 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"NM_030582.4:c.3364_3365del": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121747_121748del",
"vcf": {
"alt": "C",
"chr": "HG2521_PATCH",
"pos": "121746",
"ref": "CGG"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121747_121748del",
"vcf": {
"alt": "C",
"chr": "NW_025791815.1",
"pos": "121746",
"ref": "CGG"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1122Pfs*141)",
"tlr": "NP_085059.2:p.(Gly1122ProfsTer141)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104330_104331del",
"hgvs_transcript_variant": "NM_030582.4:c.3364_3365del",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924436del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "46924425",
"ref": "CGGCCCCCCAGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504522del",
"vcf": {
"alt": "C",
"chr": "21",
"pos": "45504511",
"ref": "CGGCCCCCCAGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924426_46924436del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "46924425",
"ref": "CGGCCCCCCAGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504512_45504522del",
"vcf": {
"alt": "C",
"chr": "chr21",
"pos": "45504511",
"ref": "CGGCCCCCCAGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924426_46924436del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
{
"NM_030582.3:c.3366_3372dup": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1125Pfs*141)",
"tlr": "NP_085059.2:p.(Gly1125ProfsTer141)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104332_104338dup",
"hgvs_transcript_variant": "NM_030582.3:c.3366_3372dup",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924435_46924436del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "46924434",
"ref": "AGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504521_45504522del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "45504520",
"ref": "AGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924435_46924436del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "46924434",
"ref": "AGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504521_45504522del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "45504520",
"ref": "AGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924435_46924436del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"NM_030582.4:c.3366_3372dup": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121749_121755dup",
"vcf": {
"alt": "GCCCCCCA",
"chr": "HG2521_PATCH",
"pos": "121748",
"ref": "G"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121749_121755dup",
"vcf": {
"alt": "GCCCCCCA",
"chr": "NW_025791815.1",
"pos": "121748",
"ref": "G"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(G1125Pfs*141)",
"tlr": "NP_085059.2:p.(Gly1125ProfsTer141)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104332_104338dup",
"hgvs_transcript_variant": "NM_030582.4:c.3366_3372dup",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924435_46924436del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "46924434",
"ref": "AGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504521_45504522del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "45504520",
"ref": "AGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924435_46924436del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "46924434",
"ref": "AGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504521_45504522del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "45504520",
"ref": "AGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924435_46924436del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.4 contains 9 fewer bases between c.3377_3378 than NC_000021.8"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
{
"NM_030582.3:c.3377_3378insCCCACCC": {
"alt_genomic_loci": [],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(R1127Pfs*139)",
"tlr": "NP_085059.2:p.(Arg1127ProfsTer139)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104343_104344insCCCACCC",
"hgvs_transcript_variant": "NM_030582.3:c.3377_3378insCCCACCC",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "46924443",
"ref": "AGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "45504529",
"ref": "AGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "46924443",
"ref": "AGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "45504529",
"ref": "AGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.3"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924444_46924445del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.3 contains 9 fewer bases between c.3363_3364 than NC_000021.8",
"TranscriptVersionWarning: A more recent version of the selected reference sequence NM_030582.3 is available for genome build GRCh37 (NM_030582.4)"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"NM_030582.4:c.3377_3378insCCCACCC": {
"alt_genomic_loci": [
{
"grch38": {
"hgvs_genomic_description": "NW_025791815.1:g.121760_121761insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "HG2521_PATCH",
"pos": "121757",
"ref": "G"
}
}
},
{
"hg38": {
"hgvs_genomic_description": "NW_025791815.1:g.121760_121761insCCCACCC",
"vcf": {
"alt": "GCCCCCCA",
"chr": "NW_025791815.1",
"pos": "121757",
"ref": "G"
}
}
}
],
"annotations": {
"chromosome": "21",
"db_xref": {
"CCDS": "CCDS42972.1",
"ensemblgene": null,
"hgnc": "HGNC:2195",
"ncbigene": "80781",
"select": false
},
"ensembl_select": false,
"mane_plus_clinical": false,
"mane_select": false,
"map": "21q22.3",
"note": "collagen type XVIII alpha 1 chain",
"refseq_select": false,
"variant": "1"
},
"gene_ids": {
"ccds_ids": [
"CCDS42972",
"CCDS77643",
"CCDS42971"
],
"ensembl_gene_id": "ENSG00000182871",
"entrez_gene_id": "80781",
"hgnc_id": "HGNC:2195",
"omim_id": [
"120328"
],
"ucsc_id": "uc062awh.1"
},
"gene_symbol": "COL18A1",
"genome_context_intronic_sequence": "",
"hgvs_lrg_transcript_variant": "",
"hgvs_lrg_variant": "",
"hgvs_predicted_protein_consequence": {
"lrg_slr": "",
"lrg_tlr": "",
"slr": "NP_085059.2:p.(R1127Pfs*139)",
"tlr": "NP_085059.2:p.(Arg1127ProfsTer139)"
},
"hgvs_refseqgene_variant": "NG_011903.1:g.104343_104344insCCCACCC",
"hgvs_transcript_variant": "NM_030582.4:c.3377_3378insCCCACCC",
"primary_assembly_loci": {
"grch37": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "46924443",
"ref": "AGG"
}
},
"grch38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "21",
"pos": "45504529",
"ref": "AGG"
}
},
"hg19": {
"hgvs_genomic_description": "NC_000021.8:g.46924444_46924445del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "46924443",
"ref": "AGG"
}
},
"hg38": {
"hgvs_genomic_description": "NC_000021.9:g.45504530_45504531del",
"vcf": {
"alt": "A",
"chr": "chr21",
"pos": "45504529",
"ref": "AGG"
}
}
},
"reference_sequence_records": {
"protein": "https://www.ncbi.nlm.nih.gov/nuccore/NP_085059.2",
"refseqgene": "https://www.ncbi.nlm.nih.gov/nuccore/NG_011903.1",
"transcript": "https://www.ncbi.nlm.nih.gov/nuccore/NM_030582.4"
},
"refseqgene_context_intronic_sequence": "",
"rna_variant_descriptions": null,
"selected_assembly": "GRCh37",
"submitted_variant": "NC_000021.8:g.46924444_46924445del",
"transcript_description": "Homo sapiens collagen type XVIII alpha 1 chain (COL18A1), transcript variant 1, mRNA",
"validation_warnings": [
"Submitted description does not represent a true variant because it is an artefact of aligning NM_030582.4 with NC_000021.8 (genome build GRCh37)",
"NM_030582.4 contains 9 fewer bases between c.3377_3378 than NC_000021.8"
],
"variant_exonic_positions": {
"NC_000021.8": {
"end_exon": "33",
"start_exon": "33"
},
"NC_000021.9": {
"end_exon": "33",
"start_exon": "33"
},
"NG_011903.1": {
"end_exon": "33",
"start_exon": "33"
}
}
},
"flag": "gene_variant",
"metadata": {
"variantvalidator_hgvs_version": "2.2.0",
"variantvalidator_version": "2.2.1.dev677+g36d03e7.d20240725",
"vvdb_version": "vvdb_2024_8",
"vvseqrepo_db": "VV_SR_2024_06/master",
"vvta_version": "vvta_2024_06"
}
}
I'm sorry about the headaches :(
No worries, needs to be done.
I'm not sure if they're all functional, as I have seen some incorrect outputs and I'm not sure if they're all fixed now.
You will not be able to run this code yet and see the changes. Nees to be completed and deployed :)
Thanks for the samity checking too. I will fix the last remaining issue on this and ask for you to accept the outputs. May be a few days. Got limited dev time at the moment. Will get on it ASAP
@ifokkema. All code changed now made and all expected outputs are correct on my dev version. Just need to plan a merge of some other code that handles expanded repeats with @John-F-Wagstaff then we can update the servers once merged into master
I will close this once the server is updated
Hello @Peter-J-Freeman just an update on why the alignment for NM_030582.3 did not get updated.
The cigar changed, but the alignment spans did not. Since the UTA rebuilds the cigars itself, dumping them all and rebuilding for each release, rather than importing them, the logic flow we inherited from them do not end up updating the alignments unless the mapped spans change. In fact because of this behaviour the database structure, which we still mostly share with the UTA for this, has no way to handle CIGARs changing without the spans also changing. Altering this without breaking the inbuilt handling for archived alignments will require require database changes, possibly major database changes, and will require significant alterations to the database loading scripts.
I am going to get a release version without this fix done ASAP for our next release. Given the nature of the targets affected we should be able to get away with a point release fix to just the VVTA for this, without changes to the VV db or the SeqRepo release we rely on. This fix may take a while to get right, and the current data is old but not broken, so we probably don't want to hold off on a major VariantValidator release for it.
@John-F-Wagstaff . Makes sense and explains why the alignment didn't update.
We will make the release ASAP
This one seems to be fixed now!
@ifokkema . Thanks for closing. Looking at the others now
Describe the bug There is something fishy with
NM_030582.4
. Take, e.g., this deletion that maps to this transcript:NC_000021.8:g.46924426_46924427del
.The short version of the story: submitting
NC_000021.8:g.46924426_46924427del
will map toNM_030582.4:c.3364_3365del
which in turn maps back toNC_000021.8:g.46924426_46924436del
, which wasn't the input.To Reproduce Steps to reproduce the behavior:
Expected behavior The genomic output should also be the input, but I believe, in this case, it's the mapping to the transcript that went wrong. What I figured out:
NM_030582.4:c.3363_3364insCCCCCCA
.Additional context
NM_030582.3:c.3363_3364insCCCCCCA
. As such, this seems like a regression, a recently introduced bug. However, note that the transcript version that I got back then is slightly different from the one I'm getting back now.NM_030582.4:c.3363_3364insCCCCCCA
.