microgenlab / porefile

A Nextflow full-length 16S profiling pipeline for ONT reads
GNU General Public License v3.0
24 stars 5 forks source link

Singularity pull fails #7

Closed LunavdL closed 3 years ago

LunavdL commented 3 years ago

Hi, I am trying to run the pipeline with the following command using Singularity:

nextflow run microgenlab/porefile --fq /home/lvdrloos/Test_PoreFile/*.fastq -- isDemultiplexed No --minimap2 -profile singularity

However, I think the Singularity pull fails, as I get the following error:

Error executing process > 'Demultiplex:Concatenate'

Caused by:
  Failed to pull singularity image
  command: singularity pull  --name iferres-porefile-latest.img.pulling.1607346446317 docker://iferres/porefile:latest > /dev/null
  status : 127
  message:
    WARNING: pull for Docker Hub is not guaranteed to produce the
    WARNING: same image on repeated pull. Use Singularity Registry
    WARNING: (shub://) to pull exactly equivalent images.
    /usr/bin/env: ??python??: No such file or directory
    ERROR: pulling container failed!

In addition (or possibly these problems are related) it cannot find the python executable. With which python I do find a valid path to python (/home/lvdrloos/miniconda3/bin/python), but it is not in /usr/bin/env

I am new to using Singularity, so probably I am doing something wrong, but I tried googling a solution with no success so far. Any help or advice would be much appreciated!

iferres commented 3 years ago

Hi @LunavdL,

There's a problem with your command, though I'm not sure is it causing the failure. Could you try running as follows and confirm that still fails?

nextflow run microgenlab/porefile --fq '/home/lvdrloos/Test_PoreFile/*.fastq' --minimap2 -profile singularity

For the record, I omit the -- isDemultiplexed No since it has a space after the --, and that it doesn't need an argument since is a logical flag. By default it concatenates the fastq files and demultiplex them using porechop. If it is already demultiplexed use --isDemultiplexed with no further arguments. I also single-quote the path to your fastq files, so nextflow can handle the glob.

It seems to be a singularity problem (check your singularity version is >=3.x.x), or a connexion fail. But let's fix the above and see.

There's another issue it may arise related to NanoPlot. The conda version has a bug and I'm waiting to be updated to the latest version in the repo. If that happens, use --noNanoplot to omit running that module.

Let me know if it worked.

LunavdL commented 3 years ago

Thank you for your fast response, @iferres !

My apologies - I did try the command with quotes around the path to location containing .fastq as well, but copied the wrong code from my command line. That did not make a difference. I did however update singularity (it was version 2.5, it is 3.0.3 now) and that does change the error I get. I think the pull now works, but the Silva download still doesn't: nextflow run microgenlab/porefile --fq '/home/lvdrloos/Test_PoreFile/*.fastq' --minimap2 --noNanoplot -profile singularity -r 0445ee2

executor >  local (3)
[d2/464c92] process > SetSilva:downloadFasta         [100%] 1 of 1, failed: 1 ??
[46/9d5599] process > SetSilva:downloadMeganSynMap   [100%] 1 of 1, failed: 1 ??
[5c/b1b089] process > Demultiplex:Concatenate        [100%] 1 of 1, failed: 1 ??
[-        ] process > Demultiplex:Porechop           -
[-        ] process > QFilt:NanoFilt                 -
[-        ] process > QFilt:AutoMap                  -
[-        ] process > QFilt:Yacrd                    -
[-        ] process > Fastq2Fasta                    -
[-        ] process > Minimap2Workflow:MakeMinimapDB -
[-        ] process > Minimap2Workflow:Minimap2      -
[-        ] process > Minimap2Workflow:Sam2Rma       -
[-        ] process > Minimap2Workflow:Rma2Info      -
[-        ] process > MergeResults                   -
Error executing process > 'SetSilva:downloadMeganSynMap'

Caused by:
  Process `SetSilva:downloadMeganSynMap` terminated with an error exit status (255)

Command executed:

  wget https://software-ab.informatik.uni-tuebingen.de/download/megan6/SSURef_Nr99_132_tax_silva_to_NCBI_synonyms.map.gz
  gunzip *gz

Command exit status:
  255

Command output:
  (empty)

Command error:
  FATAL:   container creation failed: unabled to /home/lvdrloos/Test_PoreFile/work/46/9d5599d58698a74041e668ee8a0265 to mount list: destination /home/lvdrloos/Test_PoreFile/work/46/9d5599d58698a74041e668ee8a0265 is already in the mount point list

Work dir:
  /home/lvdrloos/Test_PoreFile/work/46/9d5599d58698a74041e668ee8a0265

I tried if it would make a difference with -r s138, but the error is mostly the same:

Error executing process > 'SetSilva:downloadSilvaTaxNcbiSp'

Caused by:
  Process `SetSilva:downloadSilvaTaxNcbiSp` terminated with an error exit status (255)

Command executed:

  wget https://www.arb-silva.de/fileadmin/silva_databases/current/Exports/taxonomy/ncbi/tax_ncbi-species_ssu_ref_nr99_138.1.txt.gz
  gunzip *gz

Command exit status:
  255

Command output:
  (empty)

Command error:
  FATAL:   container creation failed: unabled to /home/lvdrloos/work/39/8cec50b2d69252c3f14799c28d972a to mount list: destination /home/lvdrloos/work/39/8cec50b2d69252c3f14799c28d972a is already in the mount point list

Work dir:
  /home/lvdrloos/work/39/8cec50b2d69252c3f14799c28d972a
iferres commented 3 years ago

Ok, after digging around this issue I think that the problem is that you are running the pipeline in your home directory. It's hard to explain, but its probably that singularity is mounting your home inside the container twice: once by default (unless specified, singularity automatically binds the home directory), and the second one when the container autoMounts the working directory which is the same as your home. See herehttps://github.com/nextflow-io/nextflow/issues/662 and herehttps://github.com/hpcng/singularity/issues/1469. It was fixed with singularity 3.1.1 apparently.

A solution for your current singularity version may be to create a nextflow.config file in your working directory (where you are launching nextflow) with the following content:

profiles {
   singularity {
        singularity.enabled = true
        singularity.autoMounts = false
   }
}

and launch the nextflow pipeline as before.

Hope it works.

LunavdL commented 3 years ago

Thank you, @iferres, this worked! As most of my problems were caused by a wrong set-up or version of singularity, I also installed docker and that works as well.

I did a test run on a subset of .fastq files and everything seems to work (I did have to use --noNanoplot as you suggested). I tried a lot of different pipelines and packages on my own, but couldn't figure out how to go from minimap2 output to an OTU table, so I am very happy that this pipeline exists!

I have two more questions I hope you have time to answer, but perhaps it is easier if I start new threads for these, so this issue can be closed?

1) Is there a way to know which read IDs match to which OTU? In for instance the file minimap2_BC01.info I do see which OTUs are found in BC01, but I cannot find files that show which OTU corresponds to which read.

2) Is it possible to use custom reference databases as well? I work with green seaweeds and their chloroplasts are very similar to cyanobacteria. This is the main reason I am not using EPI2ME, as I noticed that the relative contribution of 'cyanobacteria' was much higher in tissue samples compared to swabs (when using EPI2ME output), and after manually blasting a few of those reads, they turned out to be chloroplasts of the host. I found that GreenGenes does distinguish between chloroplasts and cyanobacteria, and was hoping the SILVA database does this too, but so far I only see cyanobacteria in my results (no chloroplasts).

iferres commented 3 years ago

I'm glad it worked.

Your questions:

  1. Yes. You could use MEGAN's rma2info command to extract that info from each rma file in the following way:

    # Node name
    rma2info --in minimap2_BC01.rma -r2c Taxonomy --names 
    # NCBI's Taxonomy path
    rma2info --in minimap2_BC01.rma -r2c Taxonomy --names --paths
    # Taxonomy path with major ranks only
    rma2info --in minimap2_BC01.rma -r2c Taxonomy --names --paths --majorRanksOnly

    If you want to use the rma2info from inside the same singularity image used by porefile:

    # [EDITED]
    singularity exec --cleanenv path/to/porefile.img rma2info --in minimap2_BC01.rma -r2c Taxonomy --names 

    The location of the singularity image depends on your nextflow configuration. I suggests to set a NXF_SINGULARITY_CACHEDIR environment variable on your .bashrc to tell nexflow to download images in that location.

  2. It's on our plans. For now we are committed to publish porefile as is now, but we've already discussed the necessity of going in that direction. It is not straightforward and we have other projects unrelated to porefile running, so it may take some time to implement it.

I'm closing this issue but feel free to ask any other question.