sc-zhang / ALLHiC_components

Some components that speed up and reduce resource cost for original ALLHiC
BSD 3-Clause "New" or "Revised" License
12 stars 2 forks source link

Introduction

Some components that speed up and reduce resource cost for original ALLHiC.

Dependencies

Installation

git clone https://github.com/sc-zhang/ALLHiC_components.git
cd ALLHiC_components
chmod +x bin/*.*

# install ALLHiC_prune
cd src/
make && make install

Usage

ALLHiC_prune is used for prunning singals between allelic chromosomes, which was rewritten for speedup and mem reduce.

************************************************************************
    Usage: ./ALLHiC_prune -i Allele.ctg.table -b sorted.bam
      -h : help and usage.
      -i : Allele.ctg.table
      -b : sorted.bam
************************************************************************

partition_gmap.py is used for spliting bam and contig level fasta by chromosomes with allele table.

usage: partition_gmap.py [-h] -r REF -g ALLELETABLE [-b BAM] [-d WORKDIR]
                         [-t THREAD]

optional arguments:
  -h, --help            show this help message and exit
  -r REF, --ref REF     reference contig level assembly
  -g ALLELETABLE, --alleletable ALLELETABLE
                        Allele.gene.table
  -b BAM, --bam BAM     bam file, default: prunning.bam
  -d WORKDIR, --workdir WORKDIR
                        work directory, default: wrk_dir
  -t THREAD, --thread THREAD
                        threads, default: 10

ALLHiC_partition.py is an experimental script for clustering contigs into haplotypes.

usage: ALLHiC_partition.py [-h] -r REF -b BAM -d BED -a ANCHORS -p POLY
                           [-e EXCLUDE] [-o OUT]

optional arguments:
  -h, --help            show this help message and exit
  -r REF, --ref REF     Contig level assembly fasta
  -b BAM, --bam BAM     Prunned bam file
  -d BED, --bed BED     dup.bed
  -a ANCHORS, --anchors ANCHORS
                        anchors file with dup.mono.anchors
  -p POLY, --poly POLY  Ploid count of polyploid
  -e EXCLUDE, --exclude EXCLUDE
                        A list file contains exclude contigs for partition,
                        default=""
  -o OUT, --out OUT     Output directory, default=workdir

ALLHiC_rescue.py is a new version of rescue use jcvi to prevent the collinear contigs be rescued to same group.

usage: ALLHiC_rescue.py [-h] -r REF -b BAM -c CLUSTER -n COUNTS -g GFF3 -j
                        JCVI [-e EXCLUDE] [-w WORKDIR]

optional arguments:
  -h, --help            show this help message and exit
  -r REF, --ref REF     Contig level assembly fasta
  -b BAM, --bam BAM     Unprunned bam
  -c CLUSTER, --cluster CLUSTER
                        Cluster file of contigs
  -n COUNTS, --counts COUNTS
                        count REs file
  -g GFF3, --gff3 GFF3  Gff3 file generated by gmap cds to contigs
  -j JCVI, --jcvi JCVI  CDS file for jcvi, bed file with same prefix must
                        exist in the same position
  -e EXCLUDE, --exclude EXCLUDE
                        cluster which need no rescue, default="", split by
                        comma
  -w WORKDIR, --workdir WORKDIR
                        Work directory, default=wrkdir

ALLHiC_plot.py is used to plot heatmap of Hi-C singal, and compare with original version, it can reduce the usage of memory, and easier plot heatmap with other resolution.

# Notice: bam file must be indexed
usage: ALLHiC_plot.py [-h] -b BAM -l LIST [-a AGP] [-5 H5] [-m MIN_SIZE] [-s SIZE] [-c CMAP] [-o OUTDIR] [--line | --block] [--linecolor LINECOLOR] [-t THREAD]

options:
  -h, --help            show this help message and exit
  -b BAM, --bam BAM     Input bam file
  -l LIST, --list LIST  Chromosome list, contain: ID Length
  -a AGP, --agp AGP     Input AGP file, if bam file is a contig-level mapping, agp file is required
  -5 H5, --h5 H5        h5 file of hic signal, optional, if not exist, it will be generate after reading hic signals, or it will be loaded for drawing other resolution of heatmap
  -m MIN_SIZE, --min_size MIN_SIZE
                        Minium bin size of heatmap, default=50k
  -s SIZE, --size SIZE  Bin size of heatmap, can be a list separated by comma, default=500k, notice: it must be n times of min_size (n is integer) or we will adjust it to nearest one
  -c CMAP, --cmap CMAP  CMAP for drawing heatmap, default="YlOrRd"
  -o OUTDIR, --outdir OUTDIR
                        Output directory, default=workdir
  --line                Draw dash line for each chromosome
  --block               Draw dash block for each chromosome
  --linecolor LINECOLOR
                        Color of dash line or dash block, default="grey"
  -t THREAD, --thread THREAD
                        Threads for reading bam, default=1

Other scripts are under development, and not recommend to use.