vibansal / HapCUT2

software tools for haplotype assembly from sequence data
BSD 2-Clause "Simplified" License
207 stars 36 forks source link

multiple sequencing platform #88

Closed wyim-pgl closed 4 years ago

wyim-pgl commented 4 years ago

Hi! We have HiC, Illumina, PacBio for our genome. Do we need to run it independent extractHAIRS ? Does VCF and BAM files need to generate for a different platforms? or do we need to merge it and run it? Thanks,

vibansal commented 4 years ago
  1. Generate a single VCF file, preferably using Illumina data.

  2. Run extractHAIRS on each of the bams independently and merge the output files.

  3. Run HapCUT2 with the merged output file from (2) and the VCF file from (1).

melop commented 3 years ago

Hello, we run into a similar issue. In the documentation it is recommended to use the --hic 1 option when running extractHAIRS for HIC. But when I concatenated the files and ran hapcut2 also with --hic 1 turned on, the program complains that some of the lines lacked extra information for HIC. How should I proceed with combining multiple technologies? Thank you!

vibansal commented 3 years ago

Please see reply to #116, copied below:

You should use the "--nf 1" option when running extractHAIRS for stLFR and nanopore data. This outputs the fragments in the Hi-C fragment file format. The output files can then be concatenated and processed with HapCUT (also need to use the --nf 1 option). There is a Snakemake file (recipes/HiC_Longread/Snakefile) that has commands illustrating how to combine datasets for phasing.