lgmgeo / AnnotSV

Annotation and Ranking of Structural Variation
GNU General Public License v3.0
214 stars 35 forks source link

Pathogenic SV source coordinates on Hg19 reference. #132

Closed Mkddb closed 6 months ago

Mkddb commented 2 years ago

Hello Veronique,

Hope you are doing great.

In annotations, I have noted that certain P_loss_coordinates are coming from Hg19 reference coordinates, despite I have provided the Hg38 mode in the annotation. For example, I annotated this deletion on chr11_63361790_63445978, DEL in Hg38 mode (Job ID: AnnotSV_z2NapwjvZt in hg38). In the "pathogenic_SV" column, P_loss_source is morbid ATL3 and the coordinates are 11:63391558-63439446 (which are on Hg19 reference). Although, the Hg38 coordinates for ATL3 gene are chr11:63,624,087-63,671,974. Which appears to be outside the coordinates of my deletion and might not be having this gene as P_loss_source.

Can you please check this and help me out for its better understanding.

Thanks in advance, Mkd

lgmgeo commented 2 years ago

Hello Mkd,

Thank you so much for this report!! Indeed, the morbid gene annotations in $ANNOTSV/share/AnnotSV/AnnotationsHuman/FtIncludedInSV/PathogenicSV/GRCh38/pathogenic*_SV_GRCh38.sorted.bed are wrong... Unfortunately, I'm out of my new lab until September. I will do my best to push a new version with correct GRCh38 morbid genes coordinates. Really sorry. I will get back to you asap.

Véronique

Mkddb commented 2 years ago

Hello Veronique,

This would be really helpful to have a fix with the new version at the earliest possible. Meanwhile, can you suggest an alternative way to check such wrong entries for Pathogenic SVs, since I have 100s of variants to screen through for now? or do I have to manually check for each entry?

Looking forward to the fix.

Thanks again, Mkd

lgmgeo commented 2 years ago

Hi Mkb, I couldn't leave such a bug, just added a fix. You need to update AnnotSV to v3.1.2. I'm really sorry for the inconvenience. Let me know if everything is working now (should be) Best, Véronique

Mkddb commented 2 years ago

Hello Veronique,

Thanks for the fix and update. Everything else looks fine now, except for one discrepancy.

Sorry to disturb you again. There's a discrepancy for one variant: chr19:39727656-39735580, Deletion on Hg38 coordinates. (Job ID: AnnotSV_AQMOqJLGYB in hg38). There's no overlapped P_loss_source for it, and the gene Gene affected is CLC (CHARCOT-LEYDEN CRYSTAL PROTEIN). but the OMIM annotation in AnnotSV appeared as Cold-induced sweating syndrome 2, 610313 (3) AR, which is linked with a different gene CLCF1 (CARDIOTROPHIN-LIKE CYTOKINE FACTOR 1).

Can you please check for the same and find a fix for it ?

With kind regards, Mkd

lgmgeo commented 2 years ago

Hi Mkd,

Great if the fix works! Thanks for the feedback. Regarding your discrepancy, I don't think it's a bug. If you look at the OMIM ID 607672, CLC and CLCF1 are alternative gene symbols. So everything seems fine to me.

Best, Véronique

Mkddb commented 2 years ago

That's a bit strange. Apart from the full gene names, the coordinates for CLC (OMIM: 153310, 19:39,731,255-39,738,029) and for CLCF1 (OMIM: 607672, 11:67,364,168-67,374,177) are also very different. Even on different chromosomes.

lgmgeo commented 1 year ago

cf https://github.com/lgmgeo/AnnotSV/issues/156

tejas-j commented 10 months ago

Hi Véronique,

Sorry to dig up this old thread, but I recently encountered the same issue.

I have a variant on hg38 that was annotated with OMIM information from hg19. The variant in question is chr17:3284806-3411401 INV and the annotations are P_loss_phen and P_loss_source morbid:ASPA. If you look at the genomic locus on hg38, it does not contain the ASPA gene. However, in hg19 this gene is located in the locus.

grep -i canavan Annotations_Human/FtIncludedInSV/PathogenicSV/GRCh38/pathogenic_Loss_SV_GRCh38.sorted.bed

grep -i canavan Annotations_Human/FtIncludedInSV/PathogenicSV/GRCh37/pathogenic_Loss_SV_GRCh37.sorted.bed

both return the same coordinates 17 3379290 3406699 Canavan disease, 271900 (3) AR morbid:ASPA 17:3379290-3406699

Would really appreciate it if you could help fix the annotation sources.

Thank you, Tejas

lgmgeo commented 10 months ago

Hi Tejas,

Thanks for reporting with this specific “ASPA” example. New annotations are expected to be released in January. I keep this bug in mind to check the new update.

Sorry for the delay, I'm chasing time...

Best, Véronique

lgmgeo commented 8 months ago

I have added a fix for misleading OMIM annotation in the dev branch. This will be distributed soon with the next annotation release.

@Mkddb chr19:39727656-39735580 => Ok, no more bad OMIM_ID annotation (607672 no longer reported)

@tejas-j Still in process (not a misleading OMIM )

Mkddb commented 8 months ago

Hi Véronique,

That sounds great. Thanks for the fix for the misleading OMIM annotation. Looking forward for the next annotation release.

Best regards, MKd

lgmgeo commented 7 months ago

@tejas-j

Variant on hg38 annotated with OMIM information from hg19: grep -i canavan Annotations_Human/FtIncludedInSV/PathogenicSV/GRCh38/pathogenic_Loss_SV_GRCh38.sorted.bed grep -i canavan Annotations_Human/FtIncludedInSV/PathogenicSV/GRCh37/pathogenic_Loss_SV_GRCh37.sorted.bed Both return the same coordinates: 17 3379290 3406699 Canavan disease, 271900 (3) AR morbid:ASPA 17:3379290-3406699

I have added a fix for this bug in the dev branch. This will be distributed soon with the next annotation release. Thank you very much for the report and sorry for the long delay.

lgmgeo commented 7 months ago

AnnotSV 3.4 is posted. Let me know if everything works well on your side.