refresh-bio / DSRC

DSRC - DNA Sequence Reads Compressor
http://sun.aei.polsl.pl/dsrc/
55 stars 19 forks source link

DSRC

GitHub downloads Bioconda downloads

DSRC is a toolkit designed for efficient high-performance compression of sequencing reads stored in FASTQ format, where it's main features are:

Building

Build prerequisites

Linux

DSRC binaries and C++ library can be compiled in two ways, depending on the selection of multithreading support library - for each a different makefile file is provided. In the first case, boost::threads library will be used, which is needed to be present on the build system. In the second - g++ compiler with c++11 support (version >= 4.8).

By default, binaries and libraries are compiled using g++, however compiling using Clang or Intel icpc should also succeeed without any problems.

Mac OSX

On Mac OSX Clang compiler will be used with c++11 support, so make sure to have Clang in version >= 3.3 installed.

Windows

To compile DSRC under Windows OS, Microsoft Visual Studio 2010 or 2012 is required. DSRC binaries and C++ library can be compiled in two ways, depending on the selection of multithreading support library - for each a different VS solution file is provided. When compiling using VS2010 the boost::threads library will be used to provide multithreading support, so make sure to have boost::threads library installed and boost library paths properly configured in Visual Studio. In case of using VS2012 c++11 standard implementation will be used to provide threading support.

There should be also no problems when compiling DSRC using MinGW-32-x64 with provided Makefile files.

Python library

To build DSRC Python library, boost::python library in development version and boost::build tool bjam are need to be present on the system. Next, in the Jamroot configuration file in py directory a local boost installation directory needs to be specified:

# To compile DSRC Python module please specify your boost installation directory below
#
use-project boost 
    : /absolute/path/to/boost/directory/ ;

Python library will be built using a default compilation toolset available on the build platform (auto selected by bjam), however in order to specify a different one append

<toolset>name

to the compilation flags as exmplained in the Jamroot file

# Specify toolset according to your platform manually in case of compilation problems in form: '<toolset>gcc'
# Available toolsets:
#   - Windows: msvc-*
#   - Linux: gcc, clang
#   - Mac OSX: darwin, gcc
    : <variant>release <address-model>64 <link>shared <runtime-link>shared <debug-symbols>off <inlining>full <optimization>speed <warnings>on <cxxflags>"-O2 -m64 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DUSE_BOOST_THREAD" ;

Building on Linux

Binary

To compile DSRC using boost::threads with static linking, in the main directory type:

make bin

To compile DSRC using g++ >= 4.8 with c++11 standard and dynamic linking:

make -f Makefile.c++11 bin

The resulting dsrc binary will be placed in bin subdirectory.

C++ library

To compile C++ DSRC library using boost::threads:

make lib

To compile DSRC using g++ >= 4.8 with c++11:

make -f Makefile.c++11 lib

The resulting libdsrc.a library will be placed in lib subdirectory.

Python library

To compile DSRC Python library:

make pylib

The resulting pydsrc.so library will be available in py subdirectory.

Building on Mac OSX

Binary

To compile DSRC binary, in the main directory type:

make -f Makefile.osx bin

The resulting dsrc binary will be placed in bin subdirectory.

C++ library

To compile DSRC C++ library:

make -f Makefile.osx lib

The resulting libdsrc.a library will be placed in lib subdirectory.

Python library

To compile DSRC Python library:

make -f Makefile.osx pylib

The resulting pydsrc.so library will be available in py subdirectory.

Building on Windows 64-bit

Binary

To compile DSRC using Visual Studio 2010 with boost::threads for multithreading support use the dsrc20-vs2k10.sln solution file. However, to compile DSRC using Visual Studio 2012 with c++11 threads use the dsrc20-vs2k12.sln.

To compile DSRC executable, select Release|x64 configuration and build.

The resulting dsrc.exe executable will be placed in bin subdirectory.

C++ library

To compile DSRC library, select Release Lib|x64 configuration and build.

The resulting dsrc.lib library will be placed in lib subdirectory.

Python library

To compile DSRC Python library in the py subdirectory type:

bjam

The resulting pydsrc.pyd library will be available in py subdirectory.

Usage

DSRC can be run from the command prompt:

dsrc <c|d> [options] <input_file_name> <output_file_name>

in one of two modes:

Available options

Compression options

Automated compression modes

Options for both compression and decompression

Usage examples

Compress SRR001471.fastq file saving DSRC archive to SRR001471.dsrc:

dsrc c SRR001471.fastq SRR001471.dsrc

Compress file in the fast mode with CRC32 checking and using 4 threads:

dsrc c -m0 -c -t4 SRR001471.fastq SRR001471.dsrc

Compress file using DNA and Quality compression level 2 and using 512 MB buffer:

dsrc c -d2 -q2 -b512 SRR001471.fastq SRR001471.dsrc

Compress file in the best mode with lossy Quality mode and preserving only 1–4 fields from record IDs:

dsrc c -m2 -l -f1,2,3,4 SRR001471.fastq SRR001471.dsrc

Compress in the best mode reading raw FASTQ data from stdin:

cat SRR001471.fastq | dsrc c -m2 -s SRR001471.dsrc

Decompress SRR001471.dsrc archive saving output FASTQ file to SRR001471.out.fastq:

dsrc d SRR001471.dsrc SRR001471.out.fastq

Decompress archive using 4 threads and streaming raw FASTQ data to stdout:

dsrc d -t4 -s SRR001471.dsrc > SRR001471.out.fastq

Citing

Roguski, L., Deorowicz, S. (2014) DSRC 2: Industry-oriented compression of FASTQ files, Bioinformatics, 30(15):2213–2215.

Deorowicz, S., Grabowski, Sz. (2011) Compression of DNA sequences in FASTQ format, Bioinformatics, 27(6):860–862.