lgmgeo / AnnotSV

Annotation and Ranking of Structural Variation
GNU General Public License v3.0
214 stars 35 forks source link

DGV are no longer in the result #62

Closed xiucz closed 1 year ago

xiucz commented 2 years ago

Hi, In the AnnotSV version 3.3, the DGV annotations are not in the result, how can I reobtain it ? image

Thank you. Xiucz

lgmgeo commented 2 years ago

Hi,

Since v3.0, benign SV annotations are merged from multiple sources (DGV, ClinVar, ClinGen, DDD, gnomAD, 1000g and IMH). You can't have access to the previous output columns (the ones you reported).

The current output columns are: B_gain_source; B_gain_coord; B_loss_source; B_loss_coord; B_ins_source; B_ins_coord; B_inv_source; B_inv_coord So, you can identify DGV annotation (among ClinVar, DDD... annotations) thanks to the "source" output column that reports the DGV ID (e.g. dgv2229e212, esv3622062, nsv515177).

Please, for more information, look at the "Known benign genes or genomic regions annotation" section from the README

FYI, the v3.1 is coming soon (this month or early November). You will have access to additional output columns: B_gain_AFmax B_loss_AFmax B_ins_AFmax B_inv_AFmax

Best regards, Véronique

xiucz commented 2 years ago

I am eager to test the new v3.1 version, because the DGV AF is important to filter CNVs.😊.

lgmgeo commented 2 years ago

v3.1 is planned around November 15th, Sorry for the delay

lgmgeo commented 2 years ago

November 8, 2021: v3.1 is posted!

Does this meet your needs? Thank you for any feedback you can provide me on benign AF.

xiucz commented 2 years ago

Thank you, I will feedback later.

JMdeSteAgathe commented 1 year ago

Hello, would it be possible to add the DGV Gold Standard frequencies to the annotation?

Indeed, the README, mentions DGV Gold Standard:

DGV Gold Standard (The “B_*_source” output values begin with “dgv”, “nsv” or “esv”)

But, I think the DGV Gold Standard values begin with "gssv". My personnal suggestion would be to provide it separately, in a dedicated and unique DGV column where the frequency displayed corresponds to the same CNV type (i.e. gain / loss). And set to 0 when empty.

It would be so helpful as it would be the safest and quickest way to filter frequent CNVs, which I find quite hard to do in the current config. Best, Jean-Madeleine

lgmgeo commented 1 year ago

Hi Jean-Madeleine,

Different sources of benign genes or genomic regions have been merged in AnnotSV to create the BENIGN dataset.

First, my apologies, it's not the "DGV Gold Standard" (as mentioned below in the README) but the "DGV Variants" that is used in AnnotSV.

image

Indeed, in the README, the "DGV benign SV annotations" section indicates the good sources:

image

The idea in AnnotSV is to not increase the output columns. That's why, for benign annotation, you don't have a separate output column dedicated to DGV.

Frequencies are available for filtering with the B_gain_AFmax, B_loss_AFmax... features.

image

You can change the default Allele frequency threshold (1%) with the “-benignAF” option:

image

Best,

Véronique PS: The README will be updated with the next version.

JMdeSteAgathe commented 1 year ago

Thank you Véronique for your quick and kind response. Here are the two difficulties I have with your solution:

I can imagine why adding new columns can be tricky! I am sorry if I seem insisting... I genuinely know quite a bunch of AnnotSV users who would love to have a filterable and unique DGV gold frequency. 😀 Best, Jean-Madeleine

lgmgeo commented 1 year ago

The best solution would be to select benign SV more stricly. Why not to run AnnotSV with the following option: -benignAF 0.005 (Allele frequency threshold to select the benign SV in the data sources)