Pypore is a python tool box for fast and accurate quality control, conversion and alignment of nanopore sequencing data, in their raw format (Fast5). We developed PyPore as a command-line tool composed by three modules (seqstats
, fastqgen
and alignment
), each provided with a set of specific options. PyPore comes out with a nice interactive result representation function, based on the plotly library, in order to allow user to zoom and pan the result summary getting information related to a specific experimental point.
Before proceeding with PyPore installation, check for HDF5 dependencies.
h5cc -showconfig
If you are on OS X system equipped with the HomeBrew package manager, check the available packges list by typing:
brew list
If missing, install HDF5 through the HomeBrew Science "tap":
brew tap homebrew/science
brew install hdf5
conda install -c anaconda hdf5
libhdf5-dev
package. Make sure that you have the development headers, as they are usually not installed by default.git clone --single-branch -b master https://github.com/rsemeraro/PyPore
git clone --single-branch -b Benchmark https://github.com/rsemeraro/PyPore.git
cd PyPore
python setup.py install
PyPore consists of the following three modules:
seqstats
provides an interface to explore the information related to a dataset of Fast5 files (single or multi read fast5) and to, optionally, convert and gather them in FastQ data. The basic syntax, for a set of single read Fast5 files, is:
pypore seqstats -i Files/Folder -l sample_label
Alternatively, by triggering the --multi_read_fast5/-m
argument is it possible to run seqstats
on a multi read Fast5 dataset:
pypore seqstats -i Files/Folder -l sample_label --threads_number 8 --multi_read_fast5 yes
To use seqstats
with Albacore outputs (FastQ and summary_file), an albacore summary file is requested (--albacore_summary/-a
). By switching to albacore mode, the seqstats
input (-i
) become the albacore fastq directory.
pypore seqstats -i FastQFiles/Folder -l sample_label -a /path/to/summary_file.txt --threads_number 8
By means of --fastq/-fq
and --threads_number/-n
options, it is possible to activate the fastq generation and to use multiple processors to speed up analysis.
pypore seqstats -i Files/Folder -l sample_label --threads_number 8 --fastq yes
To use seqstats
with the test_data, go to the PyPore folder and type:
pypore seqstats -i test_folder/test_dataset -l my_test -fq yes -n 3
To see all options, type:
pypore seqstats -h
Interactive Summaries
Outputs generated by seqstats
are:
_sequencing_summary.html_
_pore_activity_map.html_
fastqgen
is a faster alternative to seqstats, for FastQ generation, allowing user to convert data without wasting time in multiple parsing. The basic syntax is:
pypore fastqgen -i Files/Folder -l sample_label
By means of --threads_number/-n
option, it is possible to use multiple processors to speed up conversion.
pypore fastqgen -i Files/Folder -l sample_label -n 8
To see all options, type:
pypore fastqgen -h
The last feature of our tool consist of an alignment module based on three state-of-the-art long-read aligners and able to generate an interactive resulting summary. The basic syntax is:
pypore alignment -i input_1.fastq input_2.fastq -r reference.fasta -l sample_label
As input you can pass a single or multiple fastq, optionally, it is possible to obtain an HTML summary file, by means of argument —-alignment_stats/-s
, or/and to customize the aligners list, composed by minimap2(m
), bwa(b
) and ngmlr(n
), removing some of them or editing their execution order —-aligner/-a
.
pypore alignment -i input_1.fastq -r reference.fasta -l sample_label -a b m n -s yes
To see all options, type:
pypore alignment -h
Interactive Summary
_alignment_stats.html_
This program has been developed by Roberto Semeraro, Department of Experimental and Clinical Medicine, University of Florence