Open seandavi opened 7 years ago
Great that you're testing this! In theory Dockerflow can handle it, but I don't have a test case on it.
You should be able to define the outputFile's local path explicitly:
.outputFile("indexVersion","index/versionInfo.json","/mnt/data/index/versionInfo.json")
Does that work?
FYI that the meaningless number in the path is a workaround to avoid naming collisions and avoid non-existing directories by keeping everything in a flat directory on local disk. We have some ideas of how to fix that in the Pipelines API (or in Dockerflow), but haven't gotten around to it. One idea is to put files like this:
/mnt/data/gs/bucket/path/inputFile /mnt/data/gs/bucket/path/outputFile
Or we could split into separate inputs and outputs:
/mnt/data/inputs/gs/bucket/path/inputFile /mnt/data/outputs/gs/bucket/path/outputFile
Either way, we'd need to run "mkdir -p" so that tools can write to the folder if they're not checking to make sure it exists. Do you have a preference for either of these approaches?
I think that the former is a little less complicated and perhaps less magical. I think that is also the way that Cromwell is handling this, also. There is no need for you to do the same thing, but following the precedent may not be a bad thing.
I'll try to test this for the time being, but I like you idea of a more permanent solution; if you think you'll get to it soon, let me know and I will avoid hacks for the time being.
Thanks for the feedback. We're unlikely to have it fixed for you very soon -- I'd suggest using the explicit path for now. Let me know if it doesn't work and i can turn around a quick fix.
On Wed, Nov 2, 2016, 1:49 PM Sean Davis notifications@github.com wrote:
I think that the former is a little less complicated and perhaps less magical. I think that is also the way that Cromwell is handling this, also. There is no need for you to do the same thing, but following the precedent may not be a bad thing.
I'll try to test this for the time being, but I like you idea of a more permanent solution; if you think you'll get to it soon, let me know and I will avoid hacks for the time being.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/googlegenomics/dockerflow/issues/16#issuecomment-257994654, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiXqWVPs0acgfstxVN_qMQMIcO3powzks5q6PdzgaJpZM4KnI2x .
Thanks, @jbingham. Sounds like a plan.
@Sean, did you ever get this working? Need anything from me?
On Wed, Nov 2, 2016 at 2:23 PM Sean Davis notifications@github.com wrote:
Thanks, @jbingham https://github.com/jbingham. Sounds like a plan.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/googlegenomics/dockerflow/issues/16#issuecomment-258003443, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiXqawsBD2aL3WavxnoxWp68QEUXr5Gks5q6P9agaJpZM4KnI2x .
@Sean, any updates on this? I'm looking into adding recursive directory copy, and I could change this behavior at the same time.
PR for recursive folder copy is in now. The change to folder paths isn't yet.
@seandavi FYI!
I am working with a software package that takes as a parameter the name of an output directory. Then, a bunch of files are written to that output directory. I cannot see a way to specify those files as outputs since dockerflow appears to ignore the prefix when specifying the outputFile. Here is what the output directory structure looks like.
Here is the code I am trying to use (with just one of the files above as an example):
And here is the error I am getting. Note that the
gsutil cp
fails because theindex/
in the path appears to be ignored.I am likely just misunderstanding some pieces here, but I thought I would just go ahead and ask.