nf-core / tools

Python package with helper tools for the nf-core community.
https://nf-co.re
MIT License
225 stars 186 forks source link

use `pathlib.Path` instead of `os.path` #1527

Open fabianegli opened 2 years ago

fabianegli commented 2 years ago

Description of feature

This repo uses many different ways to handle paths. Plain strings, os and pathlib. Currently, most path handling is done with os.path. The reason to propose the pathlib.Path option is that it also has the convenient .is_dir, is_file and .glob functions as well as a nice path composition API and even operators.

p.absolute(        p.glob(            p.is_reserved(     p.name             p.rename(          p.suffix          
p.anchor           p.group(           p.is_socket(       p.open(            p.replace(         p.suffixes        
p.as_posix(        p.home(            p.is_symlink(      p.owner(           p.resolve(         p.symlink_to(     
p.as_uri(          p.is_absolute(     p.iterdir(         p.parent           p.rglob(           p.touch(          
p.chmod(           p.is_block_device( p.joinpath(        p.parents          p.rmdir(           p.unlink(         
p.cwd(             p.is_char_device(  p.lchmod(          p.parts            p.root             p.with_name(      
p.drive            p.is_dir(          p.lstat(           p.read_bytes(      p.samefile(        p.with_suffix(    
p.exists(          p.is_fifo(         p.match(           p.read_text(       p.stat(            p.write_bytes(    
p.expanduser(      p.is_file(         p.mkdir(           p.relative_to(     p.stem             p.write_text(  

Do the gatekeepers have any thoughts about this? Or reasons not to make more use of pathlib?

As an example: https://github.com/nf-core/tools/blob/e0d00d3e09a996f59eb5ce6ebc4795452a5b1d39/nf_core/modules/nfcore_module.py#L32-L35

If module_dir would be a pathlib.Path Line 32 to 35 could be a single line:

        self.module_name = module_dir.name

This would also solve an unexpected behaviour if the pipeline resides in a nested folder structure, with multiple non-consecutive levels named "modules", self.module_name becomes a path. (There are of course solutions to this issue without pathlib.Path, but I think it is one nice example how pathlib.Path can help reduce complexity.)

>>> a_path = "modules/path/to/pipeline/modules/name"
>>> a_path.split("modules" + os.sep)[1]
'path/to/pipeline/'
>>> Path(a_path).name
'name'
ewels commented 2 years ago

Do the gatekeepers have any thoughts about this? Or reasons not to make more use of pathlib?

Happy to move to Pathlib 👍🏻 Reason we're using os.path is simply that I was not aware of Pathlib until pretty recently. My fingers are used to the os lib but from what I've seen pathlib looks nicer so happy to change. Presumably progressively, as os operators are numerous here to say the least.

That snippet you linked to as the example is particularly nasty! 😅