This PR merges dev into master to prepare for release v0.11.0-alpha.0.
Changelog
0.11.0-alpha.0
This version is a major release that breaks backwards compatibility with previous versions of pandora.
It improves pandora runtime performance by 15x and RAM usage by 20x;
Changed
The pandora index changed from a set of files in a directory structure to a single, compressible and indexable zip
file (pandora indexes now have the suffix .panidx.zip). This is now the single file that is produced by the
pandora index command and is required as argument to all the other pandora commands. This index is self contained in
the sense that it encodes all the information and metadata about it (e.g. which PRGs were used to create it, window and
kmer size, etc). This new index provide the infrastructure for the next features and simplifies working with large
reference pangenome collections, with a few million PRGs. This new index breaks backwards compatibility with previous
pandora versions. The structure of this zip archive is as follows:
_prg_names: The names of the PRGs used as input to create this index;
_prg_max_path_lengths: the length of the longest path through each PRG;
_prg_lengths: the length of the string representation of each PRG;
_minhash: the minimizer hash data structure;
_metadata: metadata about the index;
*.gfa: the several GFA files describing the minimizing kmer graph for each PRG;
*.fa: the string representation of each PRG;
Minimum C++ standard upgraded from C++11 to C++14;
We now test whether the genotype confidence of a variant is greater than or equal to the threshold provided by
--gt-conf. Previously we only tested if it was greater than;
Removed
Removed CLI parameters -w, -k and --clean from the following pandora subcommands: compare, discover, map,
seq2path;
Removed merge_index subcommand;
Removed gene-DBG and noise-filtering modules;
Fixed
Fixed a major bug on finding the longest path through PRGs;
Several refactorings to the pandora index implementation;
Optimisation of the pandora index data structure;
Added
A memory-efficient way to load PRGs when indexing and mapping, where we don't need to load all PRGs at once to process
them, but just load on demand (also known as lazy loading). This is particularly useful when working with very large
PanRGs;
Random multimapping of reads if they map equally well to several graphs, reducing mapping bias. Added parameter
--rng-seed to pandora map/compare/discover commands to make multimapping deterministic, if required;
A new parameter to deal with auto-updating error rate and kmer model (see --auto-update-params parameter in
pandora map/compare/discover commands);
Three new parameters to control when a gene should be filtered out due to too low or too high coverage (see
--min-abs-gene-coverage, --min-rel-gene-coverage and --max-rel-gene-coverage parameters in
pandora map/compare/discover commands);
This PR merges dev into master to prepare for release v0.11.0-alpha.0.
Changelog
0.11.0-alpha.0
This version is a major release that breaks backwards compatibility with previous versions of
pandora
. It improvespandora
runtime performance by 15x and RAM usage by 20x;Changed
pandora
index changed from a set of files in a directory structure to a single, compressible and indexablezip
file (pandora
indexes now have the suffix.panidx.zip
). This is now the single file that is produced by thepandora index
command and is required as argument to all the otherpandora
commands. This index is self contained in the sense that it encodes all the information and metadata about it (e.g. which PRGs were used to create it, window and kmer size, etc). This new index provide the infrastructure for the next features and simplifies working with large reference pangenome collections, with a few million PRGs. This new index breaks backwards compatibility with previouspandora
versions. The structure of this zip archive is as follows:_prg_names
: The names of the PRGs used as input to create this index;_prg_max_path_lengths
: the length of the longest path through each PRG;_prg_lengths
: the length of the string representation of each PRG;_minhash
: the minimizer hash data structure;_metadata
: metadata about the index;*.gfa
: the several GFA files describing the minimizing kmer graph for each PRG;*.fa
: the string representation of each PRG;C++11
toC++14
;--gt-conf
. Previously we only tested if it was greater than;Removed
-w
,-k
and--clean
from the followingpandora
subcommands:compare
,discover
,map
,seq2path
;merge_index
subcommand;Fixed
pandora
index implementation;pandora
index data structure;Added
--rng-seed
topandora map/compare/discover
commands to make multimapping deterministic, if required;--auto-update-params
parameter inpandora map/compare/discover
commands);--min-abs-gene-coverage
,--min-rel-gene-coverage
and--max-rel-gene-coverage
parameters inpandora map/compare/discover
commands);