inab / WfExS-backend

Workflow Execution Service Backend
Apache License 2.0
16 stars 6 forks source link

Cannot download content from ftp #26

Closed vschnei closed 1 year ago

vschnei commented 2 years ago

Dear WfExS-Team, I was testing WfExS on my local WSL2/Ubuntu. Set up of core and further dependencies on a conda environment worked without any trouble. Though during of the test workflow python3 WfExS-backend.py execute -W tests/wetlab2variations_execution_nxf_secure.wfex.stage I got the following error:


[ERROR] Cannot download content from ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R1_001.fastq.gz to 42be63ef9b0fc7d80d09513bfd3fa42b2288fd9b (while processing LicensedURI(uri='ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R1_001.fastq.gz', licences=('https://choosealicense.com/no-permission/',), attributions=[], secContext=None)) (temp file /tmp/wfexsivum2b3rtmpcache/wf-inputs/caching-5f6ef9b7-b9b8-4f40-b38e-9ac854ef5ec3): can only concatenate str (not "NoneType") to str
Traceback (most recent call last):
  File "WfExS-backend.py", line 445, in <module>
    main()
  File "WfExS-backend.py", line 429, in main
    wfInstance.stageWorkDir()
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/workflow.py", line 1027, in stageWorkDir
    self.materializeInputs()
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/workflow.py", line 809, in materializeInputs
    theParams, numInputs = self.fetchInputs(
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/workflow.py", line 1008, in fetchInputs
    newInputsAndParams, lastInput = self.fetchInputs(inputs,
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/workflow.py", line 932, in fetchInputs
    matContent = self.wfexs.downloadContent(
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/wfexs_backend.py", line 980, in downloadContent
    inputKind, cachedFilename, metadata_array, cachedLicences = self.cacheHandler.fetch(remote_file, workflowInputs_destdir, offline, ignoreCache, registerInCache, secContext)
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/cache_handler.py", line 549, in fetch
    raise CacheHandlerException(errmsg) from nested_exception
wfexs_backend.cache_handler.CacheHandlerException: Cannot download content from ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R1_001.fastq.gz to 42be63ef9b0fc7d80d09513bfd3fa42b2288fd9b (while processing LicensedURI(uri='ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R1_001.fastq.gz', licences=('https://choosealicense.com/no-permission/',), attributions=[], secContext=None)) (temp file /tmp/wfexsivum2b3rtmpcache/wf-inputs/caching-5f6ef9b7-b9b8-4f40-b38e-9ac854ef5ec3): can only concatenate str (not "NoneType") to str

No VPN was activated or anything else that could have prevent the fastq from download.

wget ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R1_001.fastq.gz was working though. Do you have any ideas how to solve it?

jmfernandez commented 2 years ago

Dear @vschnei , we have been redesigning the innards of WfExS-backend these days in order to be able to associate dataset license links to the internal data structure. So, could you share with us the output of the command

python WfExS-backend.py -V

please? The output should be something similar to:


WfExS-backend.py version 0.4.12-67-gcb8a2fa (cb8a2fa7a4eea5c71a6a157c3820a950b205f18a)
vschnei commented 2 years ago

It is indeed identical.

WfExS-backend.py version 0.4.12-67-gcb8a2fa (cb8a2fa7a4eea5c71a6a157c3820a950b205f18a)

jmfernandez commented 2 years ago

Hi again! We have identified the issue you are facing, and we have added a commit which should fix it.

Also, we have realized that you should be using -L parameter to tell a configuration file, unless you have fine tuned the automatically generated default config file, which points out both the cache and base working directories to random temporary directories at /tmp .

Last, even with the fix, the command line you are trying is going to fail later. As you are using https://github.com/inab/WfExS-backend/blob/main/tests/wetlab2variations_execution_nxf_secure.wfex.stage , some of the contents to be fetched from Broad Institute FTP servers are "password" controlled , and you are not using the -Z parameter to provide the paired security context file https://github.com/inab/WfExS-backend/blob/main/tests/wetlab2variations_credentials_nxf.wfex.ctxt .

So, after you have done a git pull in order to fetch the fix, we propose you to try variations of next command line:

python WfExS-backend.py -L tests/local_config_gocryptfs.yaml execute -W tests/wetlab2variations_execution_nxf_secure.wfex.stage -Z tests/wetlab2variations_execution_nxf_secure.wfex.stage

Hope both the fix and these tips help!

PS: As you are the very first user who is trying WfExS-backend in WSL2 (as far as we know) we are very, very interested on the feedback you can provide us, in order to widen the number of platforms where the software works

vschnei commented 2 years ago

Hey José, sorry for the late response, Covid19 impeded me a bit.

Thank you for the responce and I am happy to be the first test user.

Indeed your bugfix resolved the problem, though I have a different one right now.

python WfExS-backend.py -L tests/local_config_gocryptfs.yaml execute -W tests/wetlab2variations_execution_nxf_secure.wfex.stage 
* Command "execute".
        - Working directory will be /home/valentin/wfexs/WfExS-backend/wfexs-backend-test_WorkDir/b897e5ce-c747-4863-a8e9-9b0e73ef2c01/work
        - Instance b897e5ce-c747-4863-a8e9-9b0e73ef2c01 (nickname 'president wreck') (to be used with -J)
2022-04-12 15:27:40,933 - [INFO] downloaded RO-Crate: https://workflowhub.eu/ga4gh/trs/v2/tools/106/versions/3/NFL/files?format=zip -> /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/ro-crate-cache/64ad66e7a4eee40eb0fec98b969c6fd658d6f853.crate.zip
2022-04-12 15:27:41,463 - [INFO] materialized workflow repository (checkout 31348ed533961f84cf348bf1af660ad9de6f870c): /home/valentin/wfexs/WfExS-backend/wfexs-backend-test_WorkDir/b897e5ce-c747-4863-a8e9-9b0e73ef2c01/work/workflow
2022-04-12 15:27:44,579 - [INFO] downloading singularity container: quay.io/biocontainers/fastqc:0.11.8--2 => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/containers/SingularityContainerFactory/NextflowWorkflowEngine/quay.io-biocontainers-fastqc-0.11.8--2.img
2022-04-12 15:27:46,441 - [INFO] downloading singularity container: quay.io/biocontainers/bwa:0.7.17--h84994c4_5 => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/containers/SingularityContainerFactory/NextflowWorkflowEngine/quay.io-biocontainers-bwa-0.7.17--h84994c4_5.img
2022-04-12 15:27:46,656 - [INFO] downloading singularity container: quay.io/biocontainers/picard:2.18.25--0 => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/containers/SingularityContainerFactory/NextflowWorkflowEngine/quay.io-biocontainers-picard-2.18.25--0.img
2022-04-12 15:27:48,616 - [INFO] downloading singularity container: quay.io/biocontainers/samtools:1.3.1--5 => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/containers/SingularityContainerFactory/NextflowWorkflowEngine/quay.io-biocontainers-samtools-1.3.1--5.img
2022-04-12 15:27:48,691 - [INFO] downloading singularity container: broadinstitute/gatk3:3.6-0 => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/containers/SingularityContainerFactory/NextflowWorkflowEngine/broadinstitute-gatk3-3.6-0.img
2022-04-12 15:27:51,746 - [INFO] downloading singularity container: quay.io/biocontainers/cutadapt:1.18--py36h14c3975_1 => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/containers/SingularityContainerFactory/NextflowWorkflowEngine/quay.io-biocontainers-cutadapt-1.18--py36h14c3975_1.img
2022-04-12 15:27:52,261 - [INFO] downloading singularity container: quay.io/biocontainers/sambamba:0.6.8--h682856c_1 => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/containers/SingularityContainerFactory/NextflowWorkflowEngine/quay.io-biocontainers-sambamba-0.6.8--h682856c_1.img
2022-04-12 15:27:52,714 - [INFO] downloading singularity container: alpine:3.9 => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/containers/SingularityContainerFactory/NextflowWorkflowEngine/alpine-3.9.img
2022-04-12 15:27:52,740 - [INFO] downloading workflow input: ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R1_001.fastq.gz
2022-04-12 15:27:52,740 - [INFO] downloaded workflow input: ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R1_001.fastq.gz => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/wf-inputs/sha256~U67mfAzaA8NGxbnEPgdUdjL8gF3XJWKk7vy5PTSzx1M=
2022-04-12 15:27:52,740 - [INFO] downloaded workflow input chain: ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R1_001.fastq.gz => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/wf-inputs/sha256~U67mfAzaA8NGxbnEPgdUdjL8gF3XJWKk7vy5PTSzx1M=
2022-04-12 15:28:09,316 - [INFO] downloading workflow input: ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R2_001.fastq.gz
2022-04-12 15:28:09,316 - [INFO] downloaded workflow input: ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R2_001.fastq.gz => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/wf-inputs/sha256~EtUNxcBkEaFukWiRLFWHZgm196dZzkHVOixRM4nI0hQ=
2022-04-12 15:28:09,316 - [INFO] downloaded workflow input chain: ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/140407_D00360_0017_BH947YADXX/Project_RM8398/Sample_U5c/U5c_CCGTCC_L001_R2_001.fastq.gz => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/wf-inputs/sha256~EtUNxcBkEaFukWiRLFWHZgm196dZzkHVOixRM4nI0hQ=
2022-04-12 15:28:11,928 - [INFO] downloading workflow input: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
2022-04-12 15:28:11,928 - [INFO] downloaded workflow input: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/wf-inputs/sha256~6RV-GaleAd_EcIC1tqpVnIYd6QuZNMLqfEnNXsSeAoU=
2022-04-12 15:28:11,929 - [INFO] downloaded workflow input chain: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz => /home/valentin/wfexs/WfExS-backend/wfexs-backend-test/wf-inputs/sha256~6RV-GaleAd_EcIC1tqpVnIYd6QuZNMLqfEnNXsSeAoU=
Traceback (most recent call last):
  File "WfExS-backend.py", line 487, in <module>
    main()
  File "WfExS-backend.py", line 471, in main
    wfInstance.stageWorkDir()
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/workflow.py", line 1142, in stageWorkDir
    self.materializeInputs()
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/workflow.py", line 866, in materializeInputs
    theParams, numInputs = self.fetchInputs(
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/workflow.py", line 1123, in fetchInputs
    newInputsAndParams, lastInput = self.fetchInputs(inputs,
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/workflow.py", line 1077, in fetchInputs
    remote_pairs, lastInput = self._fetchRemoteFiles(remote_files, contextName, offline, storeDir, cacheable, inputDestDir, globExplode, lastInput)
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/workflow.py", line 930, in _fetchRemoteFiles
    alt_remote_files = [ self.buildLicensedURI(remote_file, contextName=contextName) for remote_file in remote_files ]
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/workflow.py", line 930, in <listcomp>
    alt_remote_files = [ self.buildLicensedURI(remote_file, contextName=contextName) for remote_file in remote_files ]
  File "/home/valentin/wfexs/WfExS-backend/wfexs_backend/workflow.py", line 917, in buildLicensedURI
    raise WFException(
wfexs_backend.workflow.WFException: No security context public_broad is available, needed by ftp://ftp.broadinstitute.org/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz

As you might have realized I did not use your suggested command. The -Z parameter was not working.

I have tried different WORKFLOWCONFIGFILENAMEs as well but ended up at the same point with the identical WFException.

jmfernandez commented 2 years ago

No worries @vschnei! I also got it 3 weeks ago, even having already three vaccine shots.

Could you try next command, please? It provides the security context file needed

python WfExS-backend.py -L tests/local_config_gocryptfs.yaml execute -W tests/wetlab2variations_execution_nxf_secure.wfex.stage  -Z tests/wetlab2variations_credentials_nxf.wfex.ctxt