Closed KatharinaHoff closed 1 year ago
Hi Katharina,
First of all thank you for trying geneidx and for reporting these issues. I will try to provide solutions to them and let me know whether it works.
Regarding pulling the singularity images, I have not found this issue before, but after checking the current singularity versions I see that it has been updated many times since version 1.0.0, so you could try to update it and then it should work fine. https://docs.sylabs.io/guides/3.0/user-guide/installation.html
In this link they detail how to install the latest releases since using apt install singularity
is installing only the first version.
For this issue with geneidx/Nextflow trying to write in a directory where it should not write. Based on our implementation all the output is written inside the output folder so there should not be any process trying to write in the input directory.
It could be that when the pipeline tries to mount the directories for getting the input there is some writing involved, but I don't really know how Nextflow manages this. In one of the previous issues, there was a similar error regarding permissions and containers and it was solved by adding the --bind
option to the executed command, you could try to do it this way and let me know if it works or not. I will look for other possible solutions.
Regarding this last error with the taxid, I have run the pipeline with the test dataset and this taxid and it worked without any problems selecting a set of proteins and also using the parameters from Drosophila melanogaster that are the closest ones.
By looking at the output you uploaded, it looks like there is still an error with the mounting of the docker images. I would say that this is more likely to be the reason for the error. Again, if you could try with the --bind
option it could contribute to solve it.
In case you prefer to provide the set of proteins to use, you can add the --prot-file
option to indicate the proteins to use. It should be compressed (.gz) and for getting a proper naming of all the files it would be ideal to have it as (.fa.gz). Also for the genome fasta file.
In order to test these problems with containers I would recommend to start by running the pipeline only indicating the taxid, since there are already a default genome and output directories.
nextflow run main.nf -profile docker --taxid 7240
If it does not work then I would be more convinced that the problem is due to the connections with the containers, if it works try adding the genome and your preferred output directory and try again.
Thank you again for trying geneidx and reporting these issues. Also let me know if it works or you find other problems.
Ferriol
The documentation link to singularity installation is somehow broken. But I know the website. I have a more up-to-date singularity installed according to exactly these instructions on a different machine. (Hopefully unbroken link for future readers: https://docs.sylabs.io/guides/3.0/user-guide/installation.html ) . I moved to that machine (singularity version 3.6.3) and tried, again, with the --bind option. Now, I get a different error message:
Call:
nextflow run main.nf -profile singularity --genome /nas-hs/projs/data/Drosophila_melanogaster/data/genome.fasta.masked.gz --taxid 7240 --outdir . --bind
Output:
Error executing process > 'matchAssessment:Index_fai (genome.fasta.clean.fa)'
Caused by:
Process `matchAssessment:Index_fai (genome.fasta.clean.fa)` terminated with an error exit status (1)
Command executed:
if [ ! -s genome.fasta.clean.fa.fai ]; then
echo "indexing genome genome.fasta.clean.fa"
samtools faidx -f genome.fasta.clean.fa
fi
Command exit status:
1
Command output:
indexing genome genome.fasta.clean.fa
Command error:
indexing genome genome.fasta.clean.fa
[faidx] Could not build fai index genome.fasta.clean.fa.fai
Work dir:
/home/katharina/git/geneidx/work/ea/39925d57993479add5f31774fb5b4b
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
So I try to run with your minimal input example (in Singularity). Call:
nextflow run main.nf -profile singularity --taxid 7240
That worked fine.
I tried it in Docker, too. Call:
sudo nextflow run main.nf -profile docker --taxid 7240
That died, again:
[sudo] password for katharina:
N E X T F L O W ~ version 22.10.7
Launching `main.nf` [distracted_miescher] DSL2 - revision: bb4f07340a
GeneidX
=============================================
output : /home/katharina/git/geneidx/output
genome : /home/katharina/git/geneidx/data/SampleGenomeSmall.fa.gz
taxon : 7240
WARN: A process with name 'getFASTA2' is defined more than once in module script: /home/katharina/git/geneidx/subworkflows/CDS_estimates.nf -- Make sure to not define the same function as process
[- ] process > UncompressFASTA -
[- ] process > fix_chr_names -
[- ] process > compress_n_indexFASTA -
[- ] process > prot_down_workflow:getProtFasta -
executor > local (4)
[88/1f3652] process > UncompressFASTA (SampleGenomeSmall.fa.gz) [ 0%] 0 of 1
executor > local (4)
[88/1f3652] process > UncompressFASTA (SampleGenomeSmall.fa.gz) [100%] 1 of 1, failed: 1 ✘
executor > local (4)
[88/1f3652] process > UncompressFASTA (SampleGenomeSmall.fa.gz) [100%] 1 of 1, failed: 1 ✘
[- ] process > fix_chr_names -
[- ] process > compress_n_indexFASTA -
[ca/8bc788] process > prot_down_workflow:getProtFasta (7240) [100%] 1 of 1, failed: 1 ✘
[- ] process > prot_down_workflow:downloadProtFasta -
[- ] process > build_protein_DB:UncompressFASTA -
[- ] process > build_protein_DB:runDIAMOND_makedb -
[- ] process > alignGenome_Proteins:runDIAMOND_getHSPs_GFF -
[- ] process > matchAssessment:Index_fai -
[- ] process > matchAssessment:cds_workflow:mergeMatches -
[- ] process > matchAssessment:cds_workflow:filter_by_score -
[- ] process > matchAssessment:cds_workflow:getFASTA -
[- ] process > matchAssessment:cds_workflow:ORF_finder -
[- ] process > matchAssessment:cds_workflow:updateGFFcoords -
[- ] process > matchAssessment:cds_workflow:getFASTA2 -
[- ] process > matchAssessment:getCDS_matrices -
[- ] process > matchAssessment:intron_workflow:summarizeMatches -
[- ] process > matchAssessment:intron_workflow:pyComputeIntrons -
[- ] process > matchAssessment:intron_workflow:removeProtOverlappingIntrons -
[- ] process > matchAssessment:intron_workflow:getFASTA -
[- ] process > matchAssessment:getIntron_matrices -
[- ] process > matchAssessment:CombineIni -
[- ] process > matchAssessment:CombineTrans -
[e2/489dc6] process > param_selection_workflow:getParamName (7240) [100%] 1 of 1, failed: 1 ✘
[- ] process > param_selection_workflow:paramSplit -
[a8/1450ac] process > param_value_selection_workflow:getParamName (7240) [100%] 1 of 1, failed: 1 ✘
[- ] process > param_value_selection_workflow:paramSplitValues -
[- ] process > creatingParamFile_frommap -
[- ] process > geneid_WORKFLOW:Index_i -
[- ] process > geneid_WORKFLOW:runGeneid_fetching -
[- ] process > prep_concat -
[- ] process > concatenate_Outputs_once -
[- ] process > gff3addInfo:manageGff3sectionSplit -
[- ] process > gff3addInfo:gff3intersectHints -
[- ] process > gff3addInfo:processLabels -
[- ] process > gff3addInfo:manageGff3sectionMerge -
[- ] process > gff34portal -
Execution cancelled -- Finishing pending tasks before exit
Oops ...
Error executing process > 'UncompressFASTA (SampleGenomeSmall.fa.gz)'
Caused by:
Process `UncompressFASTA (SampleGenomeSmall.fa.gz)` terminated with an error exit status (127)
Command executed:
if [ ! -s SampleGenomeSmall.fa ]; then
echo "unzipping genome SampleGenomeSmall.fa.gz"
gunzip -c SampleGenomeSmall.fa.gz > SampleGenomeSmall.fa;
fi
Command exit status:
127
Command output:
(empty)
Command error:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/etc/shadow" to rootfs at "/etc/shadow": mount /etc/shadow:/etc/shadow (via /proc/self/fd/6), flags: 0x5001: no such file or directory: unknown.
time="2023-03-27T09:46:53+02:00" level=error msg="error waiting for container: context canceled"
Work dir:
/home/katharina/git/geneidx/work/88/1f36528e563427e80b54fd1a2e5099
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
This is my docker version: Docker version 20.10.12, build 20.10.12-0ubuntu4
I feel your pain in making this work for other users. We have the same kind of issues with our containers... I guess, documentation is key, but one has to get there, first.
It is my hope that Geneidx is strong in single exon gene prediction. That's why I am curious to try it.
Hi Katharina,
Thank you for the quick reply and for pointing out that the link was broken, it should be fixed now.
I am glad to see that now the sample case works with singularity.
Regarding the error that you are getting I think I have an explanation for that.
The current implementation of this indexing step (not ideal) requires that the fasta file of the genome provided as input is named with the .fa.gz
termination. Since your input file is called genome.fasta.masked.gz
geneidx is not able to get the names properly, you could try with genome.masked.fa.gz
for example.
If you could try to change this name and re-run it I would expect this error to disappear, but let me know otherwise and I will propose other solutions, or also if another error appears I can look into it. Thank you!
Regarding docker
, I am not an expert in containers so I cannot tell what might be happening.
If using singularity is not a big problem for you, I would just use singularity, but anyway I will try to ask some colleagues and see if they have any idea on what might be happening and how it could be solved.
Thank you!
Ferriol
Thank you. I was now able to run the experiment that I had intended to run.
I am glad that you were able to run it! Any other feedback is welcome.
Thanks,
Ferriol
Hi,
I tried to run geneidx. With singularity, I failed to pull the image - but possibly that's an issue on my own server.
Here's my output of singularity --version (installed via apt-get on Ubuntu ):
With docker, I execute with root permissions. The image can be pulled. The problem is that apparently the container tries to write in places where the input data sits.
Here is my command:
Here is my error message:
It is rather clear how to fix (don't pipe the unpacked genome to any other place but the output directory).
Next, I copied the gzipped genome into a folder where root has writing permissions and tried again. I am using reference species Drosophila simulans with taxon id 7240. That is indeed the taxon id in NCBI Taxonomy
Here is my new call:
It fails to find the taxon for some obscure reason. There are most definitely D. simulans proteins at NCBI for this taxon, I have the protein set on my harddrive, too. I just don't know how to start the pipeline with a local protein set. Or maybe it finds the proteins, but s.th. goes wrong looking for geneid parameters for this taxon?
Error messages:
Best wishes,
Katharina