zeeev / wham

Structural variant detection and association testing
Other
101 stars 25 forks source link

Workflow for large number of genomes (Population) #52

Open djakubosky opened 4 years ago

djakubosky commented 4 years ago

Hi, I was curious about the suggested workflow for WHAMG on ~1K genomes. My assumption is that this is what you would want to do:

1) Run WHAMG individually on all genomes -> many vcfs 2) Filtering? - (not sure if you'd suggest filtering at this stage of the process on each individual VCF) 3) Run mergeSVcallers on these VCFs to create a set of positions 4) Genotype the putative SVs at these positions with something- (eg SVTyper) 5) Merge genotyped variants into one VCF

Does this sound reasonable? If this is the proposed approach might be helpful to add a little more detail in the wiki!

Thanks for a nice tool!

zeeev commented 4 years ago

Hi @djakubosky,

Thanks for reaching out, these are really good questions, especially for large cohorts. I've made some bullet points that might help guide you. If you find this helpful or come up with your own tricks I'd like to add them to a wiki.

djakubosky commented 4 years ago

Hi @zeeev Thanks for all the info on this, these answers are very helpful. I think I follow almost everything you are saying here. I still have one question- when I generate separate VCFs for each individual (as I won't be able to run them jointly with so many samples) will MergeSVcallers allow me to merge them within and between samples to arrive at a single consensus set of variant positions? Is there some basic filtering to do BEFORE running merging? Ideally of course, it would be good to genotype fewer sites, as SVTYPER is kinda slow sometimes.

One more thing- I will be able to assess quality of variants somewhat indirectly using their reproducibility in pairs of twins in my cohort- happy to share these results with you to give you something slightly more "formal" to illustrate these FN/FP rates- see manuscript here if interested.

zeeev commented 4 years ago

@djakubosky,

Yes mergeSV can be used to merge within, and then between.

You mentioned that you have families in your cohort? I'd joint call closely related individuals (up to 3/4).

--Zev

P.S. nice paper!

djakubosky commented 4 years ago

Thanks for the info! Is there a limit to how many VCFs can be merged with regards to memory concerns?

On Tue, Oct 1, 2019 at 3:15 PM Zev Kronenberg notifications@github.com wrote:

@djakubosky https://github.com/djakubosky,

Yes mergeSV can be used to merge within, and then between.

You mentioned that you have families in your cohort? I'd joint call closely related individuals (up to 3/4).

--Zev

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zeeev/wham/issues/52?email_source=notifications&email_token=ADSS7NRISMLA7R753XVSOPTQMPDXRA5CNFSM4I27V4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAC534A#issuecomment-537255408, or mute the thread https://github.com/notifications/unsubscribe-auth/ADSS7NUELXJZZE6QJEEAPQDQMPDXRANCNFSM4I27V4BA .