Closed RichardCorbett closed 1 year ago
Hi Richard,
Thanks for trying out simuG. And yes, 8 days are way too long! I think this problem is likely due to the fact that you have set the cnv_max_size too large (300Mb in your current setting). Since simuG assumes a uniform distribution when sampling cnv size, so it is highly likely to occur that the sampled CNV size is too large to be placed into any human chromosome. In this case, simuG will keep trying to find a chromosomal location to place this sampled CNV but will never succeed. To prevent this to occur, I should probably implement an internal safe bound check for simuG in future. For now, please try to reduce the value of -cnv_max_size to a more realistic value (e.g. 1-10 Mb) and definitely make sure it is smaller than the size of the largest chromosomes of your input genome (which is ~249Mb for human genome). Let me know how it works.
Best, Jia-Xing
Thank you.
I tried this and it worked in a few seconds:
simuG.pl -r hg38_no_alt.fa -cnv_count 50 -cnv_min_size 50 -cnv_max_size 50000000
as did this:
simuG.pl -r hg38_no_alt.fa -cnv_count 50 -cnv_min_size 500 -cnv_max_size 50000000
this command, however, runs for a week and doesn't complete:
simuG.pl -r hg38_no_alt.fa -cnv_count 50 -cnv_min_size 500 -cnv_max_size 100000000
Hi Richard,
Thanks for the testing and the confirmation!
It is the same reason (not enough genomic space to place more simulated events) that your last run with -cnv_count 50 & -cnv_max_size 100Mb cannot complete. See here for the size of human chromosomes:
As you can see, there is only 16 chromosome can hold a CNV longer than 100 Mb. Since simuG doesn't allow for overlapping events by design, so it is likely to happen that simuG cannot find enough genomic space to place more CNV events given that you want 50 CNVs in total. So if you really want to simulate very large CNVs (e.g. >50 Mb), I would recommend you to reduce the number of -cnv_count parameter as a workaround.
Best, Jia-Xing
Hi there,
I am giving your tool a try as it looks very simple to run and it seems to do exactly what I want for simulating CNV changes in germline nanopore reads.
I am running with this command:
But this has been running for 8 days. Do you have any tips to make this run faster?