ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
88 stars 12 forks source link

Container does not contain bash #4

Closed d4straub closed 1 year ago

d4straub commented 1 year ago

Hi there,

first of all thanks for making that helpful tool accessible!

I would like to use FCS in a nextflow based pipeline (nf-core/genomeassembler, in the works at nf-core) to make the genomes (also) directly ready for NCBI upload. I started with FCS-adaptor. However, bash seems to be not present in the container:

curl https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/releases/0.2.2/fcs-adaptor.0.2.2.sif -Lo fcs-adaptor.sif
singularity shell fcs-adaptor.sif
bash --version

leads to /bin/sh: 1: bash: not found

Without bash, no nextflow-based pipeline will be able to use the tool. It would be great if bash could be added to the container.

deaconjs commented 1 year ago

Hello,

Thanks for your interest! We have discussed supporting Nextflow, but we are not familiar enough at this point to make it a priority. Maybe you can help with this!

The docs are clear that the bash interpreter is required for Nextflow workflow execution, but it is not immediately clear why it is required inside the images it runs. Adding the bash interpreter introduces potential security issues, so we'll need to understand the issue better to justify the risk. Thoughts?

Thank you!

d4straub commented 1 year ago

Hi,

fyi I was not attempting to execute run_fcsadaptor.sh in the nextflow pipeline (because nextflow handles the container) but rather https://github.com/ncbi/fcs/blob/main/dist/run_fcsadaptor.sh#L111-L113, more specifically /app/fcs/bin/av_screen_x -o /output-volume/ $TAX /sample-volume/$FASTA_FILENAME using the container you provide as described above.

The docs are clear that the bash interpreter is required for Nextflow workflow execution, but it is not immediately clear why it is required inside the images it runs.

I am not deep into technical details of nextflow, but, as far as I understand: nextflow creates for each process (set of commands run in a particular container) two bash scripts, .command.run and .command.sh). .command.run is than used for submitting .command.sh (the actual process commands) as job to compute resources. If bash is missing, nextflow cannot submit the bash scripts, i.e. cannot submit any code to compute resources.

In an attempt to find a better explanation, a quick google search yielded https://training.seqera.io/ & https://biocorecrg.github.io/ELIXIR_containers_nextflow/nextflow.html, where those files are described as:

.command.sh, contains the block of code indicated in the process .command.run, contains the code made by nextflow for the execution of .command.sh and contains environmental variables, eventual invocations of linux containers etc.

Does that help?

deaconjs commented 1 year ago

Hi again! We have produced a new release that includes bash in our Docker and Singularity images. Please try it out and let us know if you have any issues or further requests!

mahesh-panchal commented 1 year ago

Thank you. And just to clarify the issue a bit more, Nextflow specifically sets the container entrypoint to use /bin/bash, although we don't know why (I think it's to fast fail in case certain tools for tracing memory and process usage are not there). Is the default container entrypoint setting any variables that need to be exported to make the tool work?

d4straub commented 1 year ago

Thanks for the new release. But unfortunately I still get /bin/sh: 1: bash: not found with

curl https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/releases/0.2.3/fcs-adaptor.0.2.3.sif -Lo fcs-adaptor.0.2.3.sif
singularity shell fcs-adaptor.0.2.3.sif
bash --version

(singularity version 3.7.0) Maybe this has to do with Mahesh's additional info above? Or I mixed something up?

deaconjs commented 1 year ago

Hello and sorry for the delay. There were a few issues to resolve to get the adaptor image running bash. Would you mind giving it another shot? I think it will work for you now!

d4straub commented 1 year ago

Great, bash --version works now, thanks!