riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
98 stars 41 forks source link

Parent directory creation with `.dump()` #101

Closed pfackeldey closed 4 years ago

pfackeldey commented 4 years ago

Currently you need to always create the parent directory before you can dump data to the task output. e.g.:

class FooBar(law.Task):

    def output(self):
        return law.LocalFileTarget("path/to/data.json")

    def run(self):
        self.output().parent.touch()
        self.output().dump(["foo", "bar"])

It would be very convenient if one would not have to do self.output().parent.touch(), at least when one uses the .dump(...) method.

What do you think about this @riga ?

riga commented 4 years ago

Yeah, I'm sometimes annoyed by this as well ;) For local targets, this is no problem at all, but I was hesitant initially to do a "mkdir_rec" every time a remote target "copy" operation happens because this is an additional network request. However, effectively one does the "touch()" anyway, so there might not even be a point behind the argument. I think we can change it, but keep an option to disable automatic directory creation again.

Edit: Ok, turns out this actually already the way it's done for remote targets, but not for local ones. Also, using RemoteFileTarget.open seems to not be using this ...

pfackeldey commented 4 years ago

Alright, I thought initially only about local file targets anyway. :)

riga commented 4 years ago

I just made the behavior of creating parent directories consistent in the master. Would you mind giving it a spin? There is an option "create_file_dir" now which is True by default.

pfackeldey commented 4 years ago

works as expected! 🎉 👍

riga commented 4 years ago

Fixed in https://github.com/riga/law/commit/dcefd94565773485f4ee9c760937203be40c69e9.