sanger-tol / genomeassembly

Implementation of ToL genome assembly workflows
https://pipelines.tol.sanger.ac.uk/genomeassembly
MIT License
18 stars 2 forks source link

Broken singularity longranger image + help with organelles workflow #48

Closed malvaradol closed 1 month ago

malvaradol commented 1 month ago

Description of the bug

Hi there!

I'm currently using the pipeline to assembly some data, and I find myself having issues with the polishing step. According to the error message, the longranger singularity image seems to be broken or no longer exists. I include here the .log file of the run.

On the other hand, I had to run the pipeline without the organelles workflow as I have not figured out how to get it running, so I take advantage of the Longranger bug to make a few questions. Is the API key of the singularity secret provided by the HPC admin or is it something that I can find on my HPC home directory? Do i need to run the workflow with internet connection? This last question is because everything has to be run on an offline node, but pulling everything with nf-core tools was doing the work so far. I also include the .log of the organelles workflow.

Thanks in advance for all the help with the bug and the questions. Looking forward to your response.

Command used and terminal output

Longranger: nextflow run 0_10_0/main.nf --input tdimi_genome.yaml -profile singularity --outdir output_tdimi --organelles_on false --hifiasm_hic_on true --polishing_on true --max_cpus 32 --max_memory 640.GB

Organelles: nextflow run 0_10_0/main.nf --input tdimi_genome.yaml -profile singularity --outdir test_data --hifiasm_hic_on true --polishing_on true --organelles_on true --max_cpus 32 --max_memory 640.GB

Relevant files

nextflow_longranger.log nextflow_organelles.log

System information

NF version: 24.04.3 Hardware: HPC Executor: LSF Container: Singularity OS: CentOS genomeassembly version: 0_10_0

muffato commented 1 month ago

Hi @malvaradol Thank you for the report. Longranger is a proprietary software product from 10X Genomics. It can be downloaded at https://support.10xgenomics.com/genome-exome/software/downloads/latest once you agree to their terms and conditions. We have a copy of it at the Sanger, on our internal Docker Registry gitlab-registry.internal.sanger.ac.uk/tol-it/software/docker-images/longranger:2.2.2-c4, but can't redistribute it due to the End User Software License Agreement. The pipeline is currently configured to use this container, and therefore fails for you because you're outside of our internal network.

We need to improve the documentation and configuration of the pipeline to make that limitation clearer and give instructions for someone else to use longranger. I think it's something along those lines:

  1. Download longranger
  2. Build a Docker container with this Dockerfile:
    ADD /local/path/to/longranger-2.2.2.tar.gz /tmp
    ARG DEST=/opt
    RUN apt-get update -y \
    && apt-get install -y wget \
    && mkdir -p $DEST \
    && tar -x -C $DEST -f /tmp/longranger-2.2.2.tar.gz \
    && ln -s $DEST/longranger-2.2.2/longranger /usr/local/bin/ \
    && rm /tmp/longranger-2.2.2.tar.gz \
    && apt-get purge -y --auto-remove wget \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
  3. Modify the container names at https://github.com/sanger-tol/genomeassembly/blob/0.10.0/modules/local/longranger/mkref/main.nf#L7 and https://github.com/sanger-tol/genomeassembly/blob/0.10.0/modules/local/longranger/align/main.nf#L10

Can you try that ? If it works we can update the documentation of the pipeline.