zhanxw / rvtests

Rare variant test software for next generation sequencing data
131 stars 41 forks source link

unbelievable slow speed for --siteFile #26

Open jielab opened 7 years ago

jielab commented 7 years ago

Please see my comment on #25, it took 43356 seconds to run a regression on 93 SNPs when I used --siteFile. But when i use bcftools first to extract those 93 SNPs to create a new VCF, which takes a minute, then it only took 109 second to run the same analysis

So, i think there is something VERY WRONG with this --siteFile option. Just want to point this out so that others don't run into the same issue.

best regards, Jie

zhanxw commented 7 years ago

I'm working on that. That does seem to unreasonably slow. I will follow up on this.

Xiaowei

On Apr 30, 2017, at 12:22 AM, jiehuang001 notifications@github.com wrote:

Please see my comment on #25, it took 43356 seconds to run a regression on 93 SNPs when I used --siteFile. But when i use bcftools first to extract those 93 SNPs to create a new VCF, which takes a minute, then it only took 109 second to run the same analysis

So, i think there is something VERY WRONG with this --siteFile option. Just want to point this out so that others don't run into the same issue.

best regards, Jie

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

zhanxw commented 7 years ago

Can you please remind me the number of sites specified by --siteFile option? Just want to confirm that with you, as I guess that very large amount of sites slows down the analysis,

Thanks.

jielab commented 7 years ago

93 sites

From: zhanxw [mailto:notifications@github.com] Sent: 2017年5月2日 16:10 To: zhanxw/rvtests rvtests@noreply.github.com Cc: jiehuang001 jiehuang001@gmail.com; Author author@noreply.github.com Subject: Re: [zhanxw/rvtests] unbelievable slow speed for --siteFile (#26)

Can you please remind me the number of sites specified by --siteFile option? Just want to confirm that with you, as I guess that very large amount of sites slows down the analysis,

Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zhanxw/rvtests/issues/26#issuecomment-298746704 , or mute the thread https://github.com/notifications/unsubscribe-auth/AZsvf42zgFuaoTJS_xlhSx_Dtt08aW8Hks5r142LgaJpZM4NMePP . https://github.com/notifications/beacon/AZsvf_Bazhd1zbDBnbUSSKsLmbdHu33-ks5r142LgaJpZM4NMePP.gif

zhanxw commented 7 years ago

@jiehuang001 I have optimized --siteFile option to improve speed. However, you may consider using --rangeFile instead.

Since you have a small amount of variants (93 variants) to analyze, I would recommend to use --rangeFile. This option will let RVTESTS utilize the VCF index file, make RVTESTS only read in these variants and analyze them.

When you have lots of variants, --siteFile is more appropriate, as RVTESTS will read in every variant, but only analyze the variants specified in --siteFile.