running extractHAIRS with very large maxIS

A-J-F-Mackintosh commented 1 year ago

Hi,

I am running hapcut2 1.3.3 with HiC data and would like to analyse the entire genome in one step.

I realise that hapcut2 is designed to be run on a single sequence, but I am working on genomes with complex structural variants where HiC reads sometimes span two chromosomes because of a haplotype-specific chromosome fusion (for an example, see Figure 2 from this paper https://academic.oup.com/g3journal/article/12/6/jkac069/6554998).

I have concatenated the genome into one sequence with the aim of generating phase blocks that are typically within a single chromosome, but sometimes span multiple chromosomes when there has been a rearrangements not captured by the reference.

To do this, I would ideally run extractHAIRS with maxIS = 400 Mb (the length of the genome). However, extractHAIRS becomes very slow when maxIS is set so high. The buffer is constantly being cleaned and the total run time will likely be in days rather than hours. Is there any way to increase the buffer size and speed up the analysis?

Best wishes,

Alex

vibansal commented 1 year ago

There is a command line argument "--maxfragments" (default value = 500000) that can be used to increased the buffer size. If that doesn't work, let me know.

A-J-F-Mackintosh commented 1 year ago

Many thanks, this worked perfectly.

Alex

vibansal / HapCUT2

running extractHAIRS with very large maxIS #133