riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
96 stars 39 forks source link

Can't use `pathlib`'s `Path` for targets #167

Closed StFroese closed 11 months ago

StFroese commented 11 months ago

I was trying to use pathtlib's Path for my targets since I like the simplicity of

path = Path('foo')
subfolder = path / "bar"

This stops working when the output method is called, for instance:

def output(self):
    self.local_target(f"mc_dataset_{self.task_id}")

def local_target(self, path, *args):
    """Path to the local target."""
    return LocalFileTarget(self.build_path() / path, *args)

This is the case because the remove_scheme method in the get_path method inside LocalFileTarget expects a string

https://github.com/riga/law/blob/ce82c5f566b853a7ec24881663669f8344445846/law/target/file.py#L505-L508

@riga Do you want this to be an option for the user to use pathlib for example or should the user fix this themself by converting to a str?

Quick fix would be something like this: Before: https://github.com/riga/law/blob/ce82c5f566b853a7ec24881663669f8344445846/law/target/file.py#L481-L486

After

 def get_path(target): 
     if isinstance(target, FileSystemTarget): 
         path = getattr(target, "abspath", no_value) 
         if path != no_value: 
             return path
     if not isinstance(target, str):
         try:
             target = str(target)
         except:
             raise TypeError(f"Expected type `str` as target/path, got {type(target)}")
     return target 
pfackeldey commented 11 months ago

You have to convert it to a string, which is (reasonably) simple with a pathlib.Path object: https://git.rwth-aachen.de/3pia/cms_analyses/common/-/blob/master/tasks/base.py#L103-108

But since pathlib is part of the python (>=3.4) std-lib, it might be reasonable to support it.

StFroese commented 11 months ago

Yes, converting it is not difficult at all :)

At some point one may also want to refactor the whole code to use pathlib.Path instead of os stuff. Would clean up the code maybe but not sure if it's worth it

riga commented 11 months ago

Thanks for opening this!

For the internal path handling it might be better to keep normal strings (some paths are passed down to luigi which doesn't use pathlib, some paths to the gfal2 bindings for remote target access), but I think the law target API should 100% support Path's being passed to all methods.

StFroese commented 11 months ago

@riga thx :)