Open fgypas opened 6 years ago
@mr-c I've just gone through the pull request and the logs and it seems that the FTPFsAccess
code does not seem to upload the input files to the FTP staging directory (ftp://ftp-private.ebi.ac.uk/upload/) as a result the job fails.
Further TESK only receives local path references to files and not the remote FTP URLs which it needs to download and stage for execution. Again this will fail if TESK only receives local path references.
TESK is set up to stage remote file/directory URLs to the cluster, execute and upload the outputs to a remote URL. The PoC we have uses FTP as a {de}staging area, but we hope to extend this to other remote storage services.
If you plan to execute cwl-tes
with job parameters that reference local files, then cwl-tes
and any FsAccess
class needs to know the remote storage location to stage the local files. This must be supplied as a parameter (e.g. --remote-storage-url
) to cwl-tes
and upload/download of the files must be additionally handled by cwl-tes
via FTPFsAccess
. TESK should receive only URLs to the staged files and the output URLs to send the outputs to.
A simpler workaround (at least for this PoC) is to ask the user to re-write their CWLs solely with references to remote FTP URLs for inputs and outputs. This is already supported in cwl-tes
and TESK natively. The missing piece is {de}staging of intermediate files between workflow steps/tasks, where I assume cwl-tes
via cwltool
make FS checks for non-existent local file paths (as they are already remote URLs). This is specifically what we want to fix!
I assume option 2 is simpler to fix as it would involve disabling the unnecessary local FS checks, but option 1 is more ideal in the long term and extensible for other remote storage.
For option 1 (i.e. the PR #25 being developed) the missing pieces are:
cwl-tes
with a --remote-storage-url
cwl-tes
to use the remote storage url to upload/download filescwl-tes
to rewrite local paths to remote url paths before submitting to TEScwltool
s FS checks for intermediate outputsLooping in @psafont who may be able to advice further after he gets back from annual leave.
@mr-c Find below the task submitted to TESK. Note the inputs[].url
and outputs[].url
paths. These have to be remote HTTP (input) FTP urls for TESK to process them.
{
"id": "task-9bc998c3",
"state": "SYSTEM_ERROR",
"name": "bwa-mem-tool.cwl",
"description": "",
"inputs": [
{
"name": "reference",
"description": "cwl_input:reference",
"url": "file:///home/mcrusoe/common-workflow-language/v1.0/v1.0/chr20.fa",
"path": "/var/lib/cwl/stg329168e2-8b26-452d-b08a-87b0e4374efe/chr20.fa",
"type": "FILE"
},
{
"name": "reads[0]",
"description": "cwl_input:reads[0]",
"url": "file:///home/mcrusoe/common-workflow-language/v1.0/v1.0/example_human_Illumina.pe_1.fastq",
"path": "/var/lib/cwl/stg36b5d550-0a36-40b2-8347-147b962288eb/example_human_Illumina.pe_1.fastq",
"type": "FILE"
},
{
"name": "reads[1]",
"description": "cwl_input:reads[1]",
"url": "file:///home/mcrusoe/common-workflow-language/v1.0/v1.0/example_human_Illumina.pe_2.fastq",
"path": "/var/lib/cwl/stg47227e17-6bdb-46d7-8682-db4f03e07728/example_human_Illumina.pe_2.fastq",
"type": "FILE"
},
{
"name": "args.py",
"description": "cwl_input:args.py",
"url": "file:///home/mcrusoe/common-workflow-language/v1.0/v1.0/args.py",
"path": "/var/lib/cwl/stg48c30ccf-3907-4e38-ab13-5c017030672b/args.py",
"type": "FILE"
}
],
"outputs": [
{
"name": "stdout",
"url": "file:///tmp/tmpn0qd_ggf/output.sam",
"path": "/var/spool/cwl/output.sam",
"type": "FILE"
},
{
"name": "workdir",
"url": "file:///tmp/tmpn0qd_ggf/",
"path": "/var/spool/cwl",
"type": "DIRECTORY"
}
],
"resources": {},
"executors": [
{
"image": "python:2-slim",
"command": [
"python",
"/var/lib/cwl/stg48c30ccf-3907-4e38-ab13-5c017030672b/args.py",
"bwa",
"mem",
"-t",
"2",
"-I",
"1,2,3,4",
"-m",
"3",
"/var/lib/cwl/stg329168e2-8b26-452d-b08a-87b0e4374efe/chr20.fa",
"/var/lib/cwl/stg36b5d550-0a36-40b2-8347-147b962288eb/example_human_Illumina.pe_1.fastq",
"/var/lib/cwl/stg47227e17-6bdb-46d7-8682-db4f03e07728/example_human_Illumina.pe_2.fastq"
],
"workdir": "/var/spool/cwl",
"stdout": "/var/spool/cwl/output.sam",
"env": {
"HOME": "/tmp/tmpn0qd_ggf",
"TMPDIR": "/tmp/tmp6gk992dx"
}
}
],
"tags": {
"CWLDocumentId": "file:///home/mcrusoe/common-workflow-language/v1.0/v1.0/bwa-mem-tool.cwl"
},
}
Thank you @susheel ; task-87dbc5ba
was submitted using ftp:/
URLs but failed; do you have server-side logs for that task?
Hi
Can you add an example of how to execute cwl-tes when the Task Execution server (funnel, TESK) is running remotely and not locally?
I tried to execute the CWL test workflow via cwl-tes in a remote TESK (https://github.com/EMBL-EBI-TSI/TESK) instance as following:
or
but non of them works. Based on a slack discussion it seems that the input and output data must be HTTP or FTP urls. Do you have a working example I can try?
Thank you in advance Foivos