sagnikbanerjee15 / Finder

A fully automated gene annotator from RNA-Seq expression data
MIT License
56 stars 14 forks source link

avoiding docker and singularity #64

Open deniskristak opened 2 years ago

deniskristak commented 2 years ago

Hi, as part of an Easybuild project focused on installing software on HPC clusters, I need to patch Finder in a way that avoids using Docker and/or Singularity altogether. I know I have to take care of dependencies myself, which I did by going carefully going through the Dockerfile. Now, all that's left is to run Finder. Is there already a patch like that, by any chance? Is there an easy way to avoid Docker/Singularity? Thanks

sagnikbanerjee15 commented 2 years ago

Hello @deniskristak,

Thank you for your interest in finder. Unfortunately, there is no such patch to avoid using docker or singularity. We are moving forward to a phase where we will implement the entire pipeline in CWL which will also use docker/singularity. Is there any reason why you would like to avoid it?

Thanks.

boegel commented 2 years ago

@sagnikbanerjee15 Docker is a bad fit for HPC clusters, and although Singularity/Apptainer is more suited for use on HPC system, we consider it mainly a workaround for software installation/compatibility problems, as opposed to a proper solution. We strongly prefer installing software through EasyBuild, and use it to maintain a consistent software stack that is optimized for the system hardware (mainly CPUs, but also GPUs/network/etc.) on which those installations will be used.

When using containers, you are generally trading performance for "mobility of compute", which we would very much like to avoid.

deniskristak commented 2 years ago

thanks for the answer @sagnikbanerjee15 . The reason is that at Easybuild, we like to take care of all dependencies by ourselves, which we do using modules and "generations" of software, to avoid any collisions etc. (which I suppose is the primary reason for you to use Docker in the first place). However, using dockerized software is not the best way at HPC clusters, because we lose the benefit of building everything from source, which greatly helps with performance optimisations.

sagnikbanerjee15 commented 2 years ago

Hello @deniskristak and @boegel,

Thank you for your comments. Sorry I was unable to get back to you due to my tight schedule. Each component of the pipeline is executed with multi cores which enhances performance. With docker/singularity the overhead introduced is only during creation of the container. More than execution time, it is important to focus on the stepwise execution since there are several moving parts to this pipeline. We are currently moving to an architecture that uses Common Workflow Language (CWL) to execute pipelines. Each component of the pipeline will be executed with a docker image. On HPCs, docker will allow the mounting of any directory to the container thereby relaxing security protocols imposed by system administrators. Hence the other option is to use singularity. We will execute our pipeline in toil environment which can execute on singularity if docker is not found on the system. Owing to the ease of development and the request for mobility I do not anticipate stepping away from docker/singularity or CWL.

I personally have no experience with Easybuild. Can it be used with CWL?

Thank you.