Closed priyambial123 closed 1 year ago
Hi,
Thank you for your interest in AnnotSV.
GnomAD SV database is from 10,847 individual genomes. Study was started on 14,891 individuals and after QC steps the data was available from 10,847. This is specified as 14,891 in the PDF manual
This number is indeed specified in the README, but as a citation:
The gnomAD sources used for AnnotSV are detailed in the README:
In the DECIPHER database, the allele frequency has been calculated from individuals with developmental disorders. How is the benign allele frequency reported here?
It is to notice the source of these data (Common copy-number variants):
Else, cf the "DDD benign SV annotations" section from the README:
There were pathogenic structural variations reported in dbVar along with coordinates in the output of the annotated files. But these coordinates don't match as the one reported in dbVar(...)
Can you send me the coordinates of the SV annotated by AnnotSV with dbVar:nssv15140042? (GRCh38)
This is the coordinate of SV in chromosome 2:
SV_start:178436319
SV_end: 178443171
Can you explain why the coordinates of the pathogenic structural variations (from dbVar) in the annotated file don't match as the one reported in dbVar? . Is the coordinates expanded based on some assumption here ?
Thank you
I'm working on it, I get back to you asap
I run your example (2:178436319-178443171 DUP) on the web server (GRCh38): https://lbgi.fr/AnnotSV/display?id=EaDX1tRg85
Let's have a look at the po_P_gain_source
result:
dbVar:nssv15140042; dbVar:nssv15161685; nssv15161863; dbVar:nssv15162217; dbVar:nssv15174359; dbVar:nssv15174602; dbVar:nssv16207855; dbVar:nssv16254741; dbVar:nssv17969793
Let's have a look at the dbVar:nssv15140042 pathogenic SV annotation distributed in AnnotSV (BED format):
grep nssv15140042 $ANNOTSV/share/AnnotSV/Annotations_Human/FtIncludedInSV/PathogenicSV/GRCh38/pathogenic_Gain_SV_GRCh38.sorted.bed
2 155632918 182056571 dbVar:nssv15140042 2:155632918-182056571
Let's have a look at the po_P_gain_coord
feature in AnnotSV (VCF format):
2:12772-241841232; 2:14239-242106609; 2:151553465-178461009; 2:155632919-182056571; 2:15673-242157305; 2:162376653-211062464; 2:168973465-214656712; 2:177533232-242065306
“poP*_*” features:
po_P_gain_phen
po_P_gain_hpo
po_P_gain_source
po_P_gain_coord
po_P_gain_percent
po_P_loss_phen
po_P_loss_hpo
po_P_loss_source
po_P_loss_coord
po_P_loss_percent
Currently, redundancy is removed from all “poP*_*” features (thanks to a sort -unique
command).
That is essential with annotation of large SV.
There is therefore no longer any correspondence between the “poP*_*” features, and I realized that it's actually not the best thing to do.
In a future version, redundancy will be removed only from “poP*_phen” and “poP*_hpo” features. So AnnotSV will keep the correspondence between “poP*_source”, “poP*_coord” and “poP*_percent” features.
Thank you, now I understand that the coordinates are not in the same order as the nssvID
Priya
Hello,
Thank you. I found the tool to be super helpful. I have few queries and suggestions:
GnomAD SV database is from 10,847 individual genomes. Study was started on 14,891 individuals and after QC steps the data was available from 10,847. This is specified as 14,891 in the PDF manual
In the DECIPHER database, the allele frequency has been calculated from individuals with developmental disorders. How is the benign allele frequency reported here?
There were pathogenic structural variations reported in dbVar along with coordinates in the output of the annotated files. But these coordinates don't match as the one reported in dbVar. For example,
I am trying to understand the annotations. It would be very helpful if can clarify these queries