niuhuifei / popoolation

Automatically exported from code.google.com/p/popoolation
0 stars 0 forks source link

assemblies with high coverage run for a very long time #10

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I have used Popoolation in the past and cited it in a publication, and I love 
the software. I am sequencing viral genomes and am interested in very low 
frequency viral variants in the population. As such, we tend to get very high 
coverage on our genomes (1000-10,000x on a single gene).

What steps will reproduce the problem?
1.  When I use the variance-at-position script to calculate pi or D on my 
samples without enabling the corrections, the script runs beautifully.
2. However, when I run the script and enable the corrections, it will run for 3 
days on our server and still not finish. I suspect that this is due to the high 
coverage of my assemblies. However, I would really like to be able to set the 
minimum SNP count to 3 to remove potential sequencing error.

I am wondering if there is any way to enable the corrections without having the 
program run for so long. 

What version of the product are you using? On what operating system?
I am using Popoolation version 1.2.2 on Mac OS X version 10.6.8. 

Original issue reported on code.google.com by lhmon...@gmail.com on 30 Sep 2013 at 3:07

GoogleCodeExporter commented 9 years ago
you are perfectly right, the correction factors take excurciatingly long to 
calculate when the coverages are >500
. The good news is that once they have been calculated for all differrent 
coverages, the script will run as fast as before (so it is internally storing 
the correction factor for every coverage)
Maybee you can subsample to a fixed coverage e.g 500?

cheers ro

Original comment by RoKof...@gmail.com on 5 Mar 2014 at 10:00