tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
843 stars 226 forks source link

Don't require entire BioPerl installation #491

Open alanorth opened 4 years ago

alanorth commented 4 years ago

The BioPerl distribution is famously finicky to install. Notably, many modules can fail to install based on one out of a few hundred (or thousand) tests failing, in which case it is very common to use --force to proceed with installation anyways. The BioPerl project itself recommends that users of the distribution specify exactly which modules they need rather than saying "install BioPerl":

[...] the BioPerl distribution only includes a subset of the project modules. Because of this, the meaning of "installing BioPerl" is rarely clear. Instead of "install BioPerl", the aim must be "install module X".

As of version 1.14.6 prokka seems to only need the following BioPerl modules:

$ grep -rh -oE "use Bio::.*$" bin/* binaries/* | sort -u | awk '{print $2}'
Bio::AlignIO;
Bio::Root::Version;
Bio::SearchIO;
Bio::Seq;
Bio::SeqFeature::Generic;
Bio::SeqIO;
Bio::Tools::CodonTable;
Bio::Tools::GFF;
Bio::Tools::GuessSeqFormat;

Prokka could probably simplify its installation steps by specifying exactly these modules. I'm installing Prokka in a cluster setup with applications on network-based shared storage and I'm doing the following:

$ git clone https://github.com/tseemann/prokka.git -b v1.14.6
$ cd prokka
$ module load perl/5.28.2
$ mkdir perl5lib
$ cpanm -l perl5lib Time::Piece XML::Simple Digest::MD5 Module::Build
$ export PERL5LIB=perl5lib/lib/perl5
$ cpanm -l perl5lib Bio::AlignIO Bio::Root::Version Bio::SearchIO Bio::Seq Bio::SeqFeature::Generic Bio::SeqIO Bio::Tools::CodonTable Bio::Tools::GFF Bio::Tools::GuessSeqFormat --force
$ ./bin/prokka -v
prokka 1.14.6

Incidentally I still had to use --force to get these to install in my environment. Hmmm.