Closed Kincekara closed 1 year ago
Hi @Kincekara
Could you please supply the exact command you ran on the command line. For example what profile did you use (i.e., singularity, conda, etc.). I suspect this is due to a configuration issue with how your compute cluster interacts with ncbi datasets command line tools.
My suggestion would be to try and run the --testGet command and ask it to use conda. This will be a bit slower because of conda downloads but this will tell you that the tool works and it is an issue with your set up on the compute cluster.
Again, the above is mostly speculation and providing the exact command would be helpful for solving the issue.
Hi @jennahamlin
The Conda profile works without a problem.
I used nextflow run mashwrapper -profile testGet,singularity
command when I got the error.
I trace back the work files. Somehow, singularity datasets cannot download the genome files. Here is the .command.log
WARNING: While bind mounting '/mnt/dm-3/hpc-scratch/work/74/d3ba9ad79bdd3ca5d9047eedf7e5ff:/mnt/dm-3/hpc-scratch/work/74/d3ba9ad79bdd3ca5d9047eedf7e5ff': destination is already in the mount point list
false
Confirming both NCBI datasets and dataformat tools are available...
Great both tools available to access NCBI...
Beginning the process...
Checking your directory...
Good, a downloadedData.tsv summary file does not already exist. Continuing...
Good, the speciesCount.txt summary file doesn't already exist. Continuing...
allDownload directory does not exist, making it now and downloading will begin...
This is one of the species that will be downloaded to make the mash database: legionella jamestowniensis
Beginning to dowload genomes from NCBI...
Assembly level is not specified as the parameter is empty ...
Error: No assembly available
No files available. Creating a file place holder for this species: legionellajamestowniensis. Exiting.
@Kincekara Yay, glad the conda version works.
I ran into the same problem with singularity on the compute cluster I was developing on. I fixed it by specifying a configuration file to work with the compute cluster specifically by providing this singularity.runOptions = '-B /etc/pki/ca-trust:/etc/pki/ca-trust'
in the config file. As far as I can tell, the singularity image of ncbi datasets does not have the certs included and that is the issue.
I was under the assumption that this configuration requirement should not be an issue to others but in your case I suspect it is the same. So lets try that first. You should set up a config file for your compute cluster like the one I have done for cdc. see here- https://github.com/jennahamlin/mashwrapper/blob/main/conf/nfcore_custom.config
You will need to specify your cluster executor (e.g., sun grid engine etc.) and whatever your queue is called (e.g., all.q) and then you will need to include singularity.runOptions = '-B /etc/pki/ca-trust:/etc/pki/ca-trust
just like in my conf file.
Once you have done that then the command would be:
nextflow run mashwrapper -profile testGet,singularity ---custom_config_base /scicomp/home-pure/ptx4/mashwrapper/conf
where you would change the path to your config file for --custom_config_base. Lastly, do not end the path with a final /, as it will not be able to locate the config file.
I updated my singularity config as you directed. It worked like a charm. Thank you!
@jennahamlin testGet fails at the MAKE_DATABASE step. I reproduced this error by changing the workdir. testUse runs without a problem. nextflow version 21.10.6.5660