Open kmavrommatis opened 4 years ago
@mr-c wrote the FTP driver. He might have a better idea where that happens.
@kmavrommatis have you pushed this new S3 branch onto Github yet?
The default output path is set via a command line arg: https://github.com/ohsu-comp-bio/cwl-tes/blob/master/cwl_tes/main.py#L406
Maybe that is the issue?
Thanks for the suggestions.
@kellrott . I have not pushed it yet, I am still testing it.
@adamstruck. When I set the output directory to an s3 location the object location is taking the form
e.g. setting the location --outdir s3://mytes-test/ the output is:
"whirpoolOutput": {
"location": "file:///opt/cwl-tes/s3%3A/mytes-test/funnel_1/whirpool",
"basename": "whirpool",
"class": "File",
"size": 200,
"path": "/opt/cwl-tes/s3:/mytes-test/funnel_1/whirpool"
},
I will keep looking .
Update: The problem seems to come from the way cwltool handles files and file URIs. I have made a pull request with cwltool to fix this. As soon this is in place I will make a pull request for cwl-tes with the additional support for s3.
It looks like @jvkersch has been working on an S3 implementation in parallel, in #38 Is there any way to combine effort?
I would be happy to combine efforts, if possible. @kmavrommatis Probably the first step is to compare my and your approach and identify commonalities/differences. If you push up your PR, we can take turns reviewing each other's code.
While working on #38 I found that there were a couple of places in the main code (i.e. outside of the backend class) that needed touching up to deal with S3. I did not make any changes to cwltool, but I noticed that it sometimes is particular with what it accepts.
Absolutely. I will make a pull request in the next couple days.
@jvkersch, @kellrott Apologies for the long delay. I have made a PR (https://github.com/ohsu-comp-bio/cwl-tes/pull/41) with the initial working code for the use of cwl-tes in an AWS environment. This is still work in progress and more a baseline for discussion than a final solution.
I have created a draft cwl-tes version that is accepting as inputs s3 URIs and creates outputs to s3 as well. This is based on the existing tool with an s3 oriented stdFsAccess class (according to #6)
However, although the process proceeds and ends with success the final output of the tool is referencing local files e.g.
although the location should be an s3 URI, and presumably path should be either non existent or empty.
I have not been able to find the place where this change happens. Is there any suggestion how to address this problem? Thanks in advance for any help.