ndaniel / fusioncatcher

Finder of Somatic Fusion Genes in RNA-seq data
GNU General Public License v3.0
141 stars 67 forks source link

confused by dependencies documentation vs configuration.cfg #163

Closed EricDeveaud closed 3 years ago

EricDeveaud commented 4 years ago

Hello,

DEPENDECIES document some dependencies for fusioncatcher but etc/configuration.cfg contains some entries for tools not documented.

eg: not documented nor in DEPENDENCIES nor in tools/install_tools.sh

are those tools required for fusion catcher ?

tools documented only in tools/install_tools.sh

furthermore for fatotwobit from tools/install_tools.sh it is expected in tools/fatotwobit not in tools/blat as stated in configuration.cfg

regards

Eric

ndaniel commented 4 years ago

Hi @EricDeveaud

oases samtools lzo (AFAIK lzo does not provide binaries, just libraries) lzop parallel pxz (is it https://github.com/jnovy/pxz)

These are optional (eg. parallel) and some of them are even obsolete (eg. samtools, oases). So they can be safely ignored. (lzo libraries are needed by lzop as far as I understand)

tools documented only in tools/install_tools.sh liftover pigz picard fatotwobit

These are required. tools/install.sh superseeds the documentation.

ndaniel commented 4 years ago

furthermore for fatotwobit from tools/install_tools.sh it is expected in tools/fatotwobit not in tools/blat as stated in configuration.cfg

It is true that is a bug.

This bug has been fixed for now in github only.

EricDeveaud commented 4 years ago

thanks for the clarification.

but I'm still a little bit confused. as fas I understand correctly one can install fusion catcher using either th bootstrap.py script on that will install the required tools and dependencies or using the install_tools.sh script to install tha required tools dependencies.

but when comparing action of both tools (bootstrap.py and install_tools.sh) i can see differences here is a sumary of those diffrencies

tool               bootstrap.py    install_tools.sh
bowtie/1.2.3           OK                 OK
bowtie2/2.3.5.1        OK                 OK
sra/sdk/2.9.6          OK                 OK
pigz/2.4               NO                 OK   <== pigz installed via install_tools.sh but not boostrap.py
liftOver               OK                 OK
blat                   OK                 OK
faToTwoBit             OK                 OK
seqtk/1.2-r101c        OK                 OK
STAR/2.7.b2            OK                 OK
BBMap/38.44            OK                 OK
picard/2.21.2          OK                 OK
velvet/1.2.10          NO                 NO  <== optional so no problem
### I understand that install_tools.sh is not meant for installing python modules dependencies
### but I put it here for comparison
biopython/1.74         OK                 NO
xlrd/1.2.0             OK                 NO
openpyxl/2.6.1         OK                 NO
numpy                  NO                 NO <= required 

these is based on checking the wget command issued by both scripts

then taking a look at the configuration.cfg

I can see some missing elements and some extraneous ones

can you provide some clarification please.

best regards

Eric

ndaniel commented 4 years ago

Hi @EricDeveaud

first, thanks for pointing this out! They are useful for improving FusionCatcher!

as fas I understand correctly one can install fusion catcher using either th bootstrap.py script on that will install the required tools and dependencies or using the install_tools.sh script to install tha required tools dependencies.

The number one method to install FusionCatcher is using bootstrap.py. This works the best because it has been used the most and therefore a lot of bugs have been ironed out. Also the best supported method to install FusionCatcher is bootstrap.py.

The method using GitHub to install FusionCatcher, that uses install_tools.sh, is very new (ie. few months old) and therefore has quite some bugs. I meant it more for developers who want to use GitHub.

ndaniel commented 4 years ago

At first glance your table looks correct.

bwa => documented but not installed velvet => documented but not installed samtools => documented but not installed numpy => documented but not installed lzo => documented but not installed lzop => documented but not installed pxz => documented but not installed parallel => documented but not installed

These are not needed:

I should remove them from bootstrap.py.

Regarding numpy, I am not sure if that is needed or not. The only thing what FusionCatcher needs BioPython for Python 2.7. So if one can install BioPython without installing NumPy then it should be ok. I just do not know if BioPython requires numpy or not.

EricDeveaud commented 4 years ago

numpy is required by biopython, see: requires entry from biopython setup.py https://github.com/biopython/biopython/blob/master/setup.py#L111

maybee you may want to use version 1.76 of 1.74 as it is the last one to support python2.7

EricDeveaud commented 4 years ago

reagrding the not needed tools

are they optional, ie if present in cfg or in PATH are they used by fusioncatcher or just not used at all ?

ndaniel commented 4 years ago

@EricDeveaud

Then numpy is required. The version of numpy and biopython does not matter for FusionCatcher. So 1.76 are 1.74 are fine both.

The not needed tools could be removed and also the lines from cfg files that refers to them.

EricDeveaud commented 4 years ago

ok many thanks for your help.

did you plan to issue a new release with these changes ?

ndaniel commented 4 years ago

yes.

EricDeveaud commented 4 years ago

ok so I will install on our cluster the next version LMK when it will be available.

ndaniel commented 4 years ago

Which method, are you using to install FusionCatcher? bootstrap.py or install_tools.sh.

EricDeveaud commented 4 years ago

by hand ;-)

I will provide required tools via environement modules install fusioncatcher and python dependencies in a virtualenv

and provide databases via our biomaj (databank manager)

that's why I may look like picky about dependencies and so on

Eric

Darkless012 commented 4 years ago

@EricDeveaud What tools are you using to install this on cluster? EasyBuild, Spack, other? You are not the only one wrestling with tools being installed/configured by some shell script :/

How do you link other SW? Do you override configuration.cfg with custom paths to sw? or did you come up with some more elegant solution?

Thanks :)

EricDeveaud commented 4 years ago

we use a build system based on makefiles ;-)

we provide tools via module environement. I will overwrite the configuratrion.cfg file for all external tools. I appreciate that fusioncatcher use user PATH this will allow to provide the tools easily, via module load.

ndaniel commented 3 years ago

In FusionCatcher v1.33, has been added a required dependency that is FASTQTK: https://github.com/ndaniel/fastqtk