Closed mdem7705 closed 5 years ago
I would recommend to test the program on one contig/chromosome to see if runs to completion.
Hey there,
I tried it with my first contig, about 6 Mbp long. I allowed it 3 hours to run, and it never completed. The only meaningful error message I have again is killed due to excessive walltime, similar to my job that ran for 90+ hours. The fragment file is also not empty. Please let me know if I can provide anything to help!
The '--ea' option may be causing the long runtime. Please run without this option.
Just tried it. It still runs for over 3h on a single contig.
If you can share the input data for a single contig, I can take a look. My email is vibansal AT ucsd.edu
I have sent you a link to a dropbox folder with the requested data.
I have looked at the data and the output files. I was also able to successfully run HapCUT2 on a small subset of the data. It appears that your data has a very high density of heterozygous variants. HapCUT2 will run on such datasets but it will be slow since the complexity of HapCUT2 scales quadratically with the number of variants per read. You can reduce the number of iterations (--converge option) to 3 or 4. In my estimate, it should finish within 6 hours on the single contig.
Thanks for sharing your data. We will think about implementing a solution to speed up hapCUT2 on datasets with a very high density of variants.
Another question considering that I have 130+ contigs to get through - if I split the genome up by contig, run hapCUT2 separately on each one, and then combine the results at the end, will that affect the overall result?
No. It is recommended to run the program independently on each contig/chromosome.
Greetings hapCUT2 team,
I am having an issue running hapCUT2 where the program runs forever and never completes - I don't ever see an error message besides running out of walltime (I have had this running for longer than 4 straight days).
My reference genome and bam file are both from pacbio reads. My reference genome has been haplophased using haploMerger2, and my pacbio reads bam file has been aligned to my reference with minimap2 and sorted with samtools. My vcf file is generated from illumina reads on the same organism with variants called using gatk's haplotypeCaller.
Here is the code I am trying to run:
I suspect there might be some incompatibility with my files causing this program to run forever and never complete, but I don't get an error message so I don't know how to diagnose this. Any help would be appreciated!
Cheers!