mkirsche / Jasmine

Jasmine: SV Merging Across Samples
MIT License
175 stars 16 forks source link

About HG002 high-confidence de novo SVs #56

Open songbowang125 opened 1 year ago

songbowang125 commented 1 year ago

Hi, In your NM paper, you mentioned that your work 'reveals a set of high-confidence de novo SVs confirmed by multiple technologies'. However, I could not find your released SV set. In your data repository (https://bx.bio.jhu.edu/data/jasmine/), there are several VCFs, but these files contains plenty of SVs. In your Fig. 4, the number of child-only SVs of HG002 on HiFi reads is about 250, where can I find this SV set?

Thanks

mschatz commented 1 year ago

Thanks for your interest!

This file contains the full set of variants: https://bx.bio.jhu.edu/data/jasmine/HG002Trio/DeNovoDetection/denovo.merged.vcf .

It was constructed by merging the results between HG002, HG003, and HG004 across all three technologies. Those in different subsets of technologies can be extracted by filtering on the SUPP_VEC field - for example 100100100 would be those in HG002 only supported by all three techs, 100000000 would be HiFi only, 000100000 would be CLR only, and 000000100 would be ONT only. One note is that this file also includes indel in the range 30-49bp, but filtering out those with SVLEN in the range [-49, 49] would make it SVs only. The plotting script here https://github.com/mkirsche/JasmineFigures/blob/main/figures/jasmine_plot/find_denovo_candidates.R shows examples of extracting these subsets (though this is after the VCF has been converted to a TSV through this script https://github.com/mkirsche/JasmineFigures/blob/main/src/VcfToTsv.java).

Good luck!

Mike & Melanie

On Mon, May 29, 2023 at 9:12 PM Songbo Wang @.***> wrote:

Hi, In your NM paper, you mentioned that your work 'reveals a set of high-confidence de novo SVs confirmed by multiple technologies'. However, I could not find your released SV set. In your data repository ( https://bx.bio.jhu.edu/data/jasmine/), there are several VCFs, but these files contains plenty of SVs. In your Fig. 4, the number of child-only SVs of HG002 on HiFi reads is about 250, where can I find this SV set?

Thanks

— Reply to this email directly, view it on GitHub https://github.com/mkirsche/Jasmine/issues/56, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP347H3VRK2EYN62HFUU3XIVCP5ANCNFSM6AAAAAAYTJ4MT4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>