PyPore

Pypore is a python tool box for fast and accurate quality control, conversion and alignment of nanopore sequencing data, in their raw format (Fast5). We developed PyPore as a command-line tool composed by three modules (seqstats, fastqgen and alignment), each provided with a set of specific options. PyPore comes out with a nice interactive result representation function, based on the plotly library, in order to allow user to zoom and pan the result summary getting information related to a specific experimental point.

Requirements

HDF5
python 2.7
- biopython
- numpy
- h5py
- plotly
- python_dateutil
- ntpath
- pysam

For Unix/OS X users only

Windows distribution comes out with precompiled samtools and minimap2 versions

Installation

Dependencies

Before proceeding with PyPore installation, check for HDF5 dependencies.

In order to check if HDF5 library is already present, type:
```
h5cc -showconfig
```
If you are on OS X system equipped with the HomeBrew package manager, check the available packges list by typing:
```
brew list
```
- If missing, install HDF5 through the HomeBrew Science "tap":
```
brew tap homebrew/science
brew install hdf5
```
Alternatively, if you use a Python distribution, such as Anaconda or Miniconda, installation of HDF5 can be done (for all OS) on the command line via:
```
conda install -c anaconda hdf5
```
For Linux or other Unix distributions the HDF5 library can be found in libhdf5-dev package. Make sure that you have the development headers, as they are usually not installed by default.
For Windows users the HDF5 library installer can be downloaded from here.
PyPore

Clone the PyPore repository:

PyPore

git clone --single-branch -b master https://github.com/rsemeraro/PyPore

PyPore with test data (170Mb)


git clone --single-branch -b Benchmark https://github.com/rsemeraro/PyPore.git

Install as root:
```
cd PyPore
python setup.py install
```

Usage

PyPore consists of the following three modules:

seqstats

seqstats provides an interface to explore the information related to a dataset of Fast5 files (single or multi read fast5) and to, optionally, convert and gather them in FastQ data. The basic syntax, for a set of single read Fast5 files, is:
```
pypore seqstats -i Files/Folder -l sample_label
```
Alternatively, by triggering the --multi_read_fast5/-m argument is it possible to run seqstats on a multi read Fast5 dataset:
```
pypore seqstats -i Files/Folder -l sample_label --threads_number 8 --multi_read_fast5 yes
```
To use seqstats with Albacore outputs (FastQ and summary_file), an albacore summary file is requested (--albacore_summary/-a). By switching to albacore mode, the seqstats input (-i) become the albacore fastq directory.
```
pypore seqstats -i FastQFiles/Folder -l sample_label -a /path/to/summary_file.txt --threads_number 8
```
By means of --fastq/-fq and --threads_number/-n options, it is possible to activate the fastq generation and to use multiple processors to speed up analysis.
```
pypore seqstats -i Files/Folder -l sample_label --threads_number 8 --fastq yes
```
To use seqstats with the test_data, go to the PyPore folder and type:
```
pypore seqstats -i test_folder/test_dataset -l my_test -fq yes -n 3
```
To see all options, type:
```
pypore seqstats -h
```
Interactive Summaries

Outputs generated by seqstats are: _sequencing_summary.html_ _pore_activity_map.html_
fastqgen

fastqgen is a faster alternative to seqstats, for FastQ generation, allowing user to convert data without wasting time in multiple parsing. The basic syntax is:
```
pypore fastqgen -i Files/Folder -l sample_label
```
By means of --threads_number/-n option, it is possible to use multiple processors to speed up conversion.
```
pypore fastqgen -i Files/Folder -l sample_label -n 8
```
To see all options, type:
```
pypore fastqgen -h
```
alignment

The last feature of our tool consist of an alignment module based on three state-of-the-art long-read aligners and able to generate an interactive resulting summary. The basic syntax is:
```
pypore alignment -i input_1.fastq input_2.fastq -r reference.fasta -l sample_label
```
As input you can pass a single or multiple fastq, optionally, it is possible to obtain an HTML summary file, by means of argument —-alignment_stats/-s, or/and to customize the aligners list, composed by minimap2(m), bwa(b) and ngmlr(n), removing some of them or editing their execution order —-aligner/-a.
```
pypore alignment -i input_1.fastq -r reference.fasta -l sample_label -a b m n -s yes
```
To see all options, type:
```
pypore alignment -h
```
Interactive Summary

_alignment_stats.html_

Contacts

This program has been developed by Roberto Semeraro, Department of Experimental and Clinical Medicine, University of Florence

rsemeraro / PyPore

readme

PyPore

Requirements

For Unix/OS X users only

Windows distribution comes out with precompiled samtools and minimap2 versions

Installation

Dependencies

PyPore

Usage

seqstats

fastqgen

alignment

Contacts