vice87 / gam-ngs

Genomic Assemblies Merger for NGS
GNU General Public License v3.0
26 stars 10 forks source link

Build instructions

Packages Required

Note: it is advised to have installed the latest version of the previous packages.

How to build the application

$ cd gam-ngs
$ mkdir build
$ cd build
$ cmake ..
$ make

GAM-NGS's executables are put in the "bin" sub-folder.

Usage

Prerequisites

GAM-NGS needs in input, for each assembly and for each read library, a file that lists BAM files of aligned libraries. This file must be formatted as shown in the following example:

$ cat assembly.PE.bams.txt
/path/to/bam/file/master-pe-lib1.bam
<min-insert-size> <max-insert-size>
/path/to/bam/file/master-pe-lib2.bam
<min-insert-size> <max-insert-size>

where each bam's path is followed by a line that specifies the minimum and maximum insert size that should be used by GAM-NGS to compute insert size mean and standard deviation. Moreover, all provided BAM files have to be coordinate-sorted (command samtools sort \<in.bam> \<out.prefix>), along with the corresponding index file (command: samtools index \<in.sorted.bam>).

Blocks' construction

$ gam-create --master-bam <master.PE.bams.txt> --slave-bam <slave.PE.bams.txt> --min-block-size <min-reads> --output <output.prefix>

where \<min-reads> is the number of reads required to build a block (region with the same reads aligned in master/slave assemblies).

The previous command will create the following files:

Merging

$gam-merge --master-bam <master.PE.bams.txt> --slave-bam <slave.PE.bams.txt> --blocks-file <blocks-file> --master-fasta <master.fasta> --slave-fasta <slave.fasta> --min-block-size <min-block-size> --threads <threads> --output <output.prefix> 2> merge.err

where:

The previous command will create the following files:

Example

Prerequisites

The following packages are needed to run the example pipeline:

Download the data set

From the example directory execute the following commands:

$ chmod +x ./download_dataset.sh
$ ./download_dataset.sh

These previous commands will download Staphylococcus data set from GAGE. In particular, after the download, the following files will be used in GAM-NGS's usage example:

Run the example pipeline

From the example directory execute the following commands (after downloading the data set):

$ chmod +x ./gam-ngs_pipeline.sh
$ ./gam-ngs_pipeline.sh

These will create alignment files (BAM) in ./Alignments sub-folder. Then the merging with GAM-NGS of Allpaths-LG and MSR-CA assemblies will be performed in ./gam-ngs_merge sub-folder.

Custom Sparsehash/Boost libraries

If you want to compile GAM-NGS with specific (local) installations of sparsehash/boost libraries, you can use the following cmake command:

$ cmake -DBOOST_ROOT=/path/to/boost_1_xx_0 -DBoost_NO_BOOST_CMAKE=TRUE -DSPARSEHASH_ROOT=~/path/to/sparsehash

Bug reporting

If gdb package is available in your system and you found a bug in GAM-NGS (e.g., segmentation fault), I will appreciate if you could run the tool in "debug mode" and send me the log file. This would help me a lot in finding the source of the problem.

The steps are the following ones:

  1. Edit CMakeLists.txt file at line 17, replacing "Release" with "Debug" so that the line becomes:

    set(CMAKE_BUILD_TYPE Debug)
  2. Re-compile the tool with the usual commands (removing any previous file/folder in the build directory)

    $ cmake ..
    $ make
  3. Run gdb with the executable (gam-create or gam-merge) in which the problem is present:

    $ gdb /path/to/gam-create-or-merge
  4. Run the executable specifying the arguments that caused the problem (possibly using a single thread):

    (gdb) set logging on
    (gdb) run --master-bam <master-pe-list.txt> --slave-bam <slave-pe-list.txt> ...
  5. After the problem has occurred, please run the following command and exit gdb:

    (gdb) backtrace
    (gdb) quit
  6. A file named gdb.txt should have been created in the current directory. Provide that file to me (possibly along with standard output and standard error).

Thank you!