Compare multiple runs of long read sequencing data and alignments. Creates violin plots or box plots of length, quality and percent identity and creates dynamic, overlaying read length histograms and a cumulative yield plot.
As of version 1.1.0 NanoComp will also create a static png image for dynamic html plots, as the latter can get quite big and slow to load for big datasets. This however requires that you install orca. Without orca the script still works, but no static copies of dynamic plots are created.
pip install NanoComp
This script is written for Python3.
NanoComp [-h] [-v] [-t THREADS] [-o OUTDIR] [-p PREFIX] [--verbose]
[--raw] [--readtype {1D,2D,1D2}] [--barcoded]
[--split_runs TSV_FILE]
[-f {eps,jpeg,jpg,pdf,pgf,png,ps,raw,rgba,svg,svgz,tif,tiff}]
[-n names [names ...]] [--plot {violin,box}] [--title TITLE]
(--fastq files [files ...] | --summary files [files ...] | --bam files [files ...])
General options:
-h, --help show the help and exit
-v, --version Print version and exit.
-t, --threads THREADS
Set the allowed number of threads to be used by the script
-o, --outdir OUTDIR Specify directory in which output has to be created.
-p, --prefix PREFIX Specify an optional prefix to be used for the output files.
--verbose Write log messages also to terminal.
--raw Store the extracted data in tab separated file.
Options for filtering or transforming input prior to plotting:
--readtype {1D,2D,1D2}
Which read type to extract information about from summary. Options are 1D, 2D,
1D2
--barcoded Barcoded experiment in summary format, splitting per barcode.
--split_runs TSV_FILE
File: Split the summary on run IDs and use names in tsv file. Mandatory header
fields are 'NAME' and 'RUN_ID'.
Options for customizing the plots created:
-f, --format {'png'(default),'jpg','jpeg','webp','svg','pdf','eps','json'}
Specify the output format of the plots. JSON output allows for customisation by the end-user after plotting the figures (https://plotly.com/python-api-reference/generated/plotly.io.read_json.html).
-n, --names names Specify the names to be used for the datasets.
-c, --colors colors Specify the colors to be used for the datasets.
--plot {violin,box,ridge,false}
Which plot type to use: 'box', 'violin' (default), 'ridge' (joyplot) or 'false' (no plots)
--title TITLE Add a title to all plots, requires quoting if using spaces
Input data sources, one of these is required.:
--fastq files [files ...]
Data is in (compressed) fastq format.
--fasta files [files ...]
Data is in (compressed) fasta format.
--summary files [files ...]
Data is in (compressed) summary files generated by albacore or guppy.
--bam files [files ...]
Data is in sorted bam files.
NanoComp --bam alignment1.bam alignment2.bam alignment3.bam --outdir compare-runs
NanoComp --fastq reads1.fastq.gz reads2.fastq.gz reads3.fastq.gz reads4.fastq.gz --names run1 run2 run3 run4
I welcome all suggestions, bug reports, feature requests and contributions. Please leave an issue or open a pull request. I will usually respond within a day, or rarely within a few days.
If you use this tool, please consider citing our publication.