rstojnic / lazydata

Lazydata: Scalable data dependencies for Python projects
Apache License 2.0
624 stars 23 forks source link

Implementing multiple backends by re-using snakemake.remote or pyfilesystem2 #17

Open Avsecz opened 5 years ago

Avsecz commented 5 years ago

Would it be possible to wrap the classes implementing snakemake.remote.AbstractRemoteObject (snakemake.remote, AbstractRemoteObject) into lazydata.remote.RemoteStorage class?

This would allow to implement the following remote storage providers in one go (https://snakemake.readthedocs.io/en/stable/snakefiles/remote_files.html):

Pyfilesystem2

Another alternative would be to write a wrapper around pyfilesystem2: https://github.com/PyFilesystem/pyfilesystem2. It supports the following filesystems: https://www.pyfilesystem.org/page/index-of-filesystems/

Builtin

Official

Filesystems in the PyFilesystem organisation on GitHub.

Third Party

rstojnic commented 5 years ago

Good idea, this would be great!

Avsecz commented 5 years ago

Can you explain how to implement something like this? E.g. where to put the class, how to name it, which methods to implement and what are the potential caveats?

rstojnic commented 5 years ago

Some thoughts:

1) I would probably keep the interface of the remote.RemoteStorage class unchanged. 2) I would create a new class remote.SnakeMakeRemoteStorage that inherits remote.RemoteStorage and takes at least two parameters: a snakemake backend name + any other necessary parameters (e.g. the access keys). I probably wouldn't want to reimplement the S3 and other existing backends. 3) In remote.RemoteStorage.get_from_url() and remote.RemoteStorage.get_from_config() make sure the remote storage backend is correctly parsed from the lazydata.yml config file and the correct child class SnakeMakeRemoteStorage instantiated. 4) In cli.commands.config I would allow for configuration of any necessary additional access keys 5) In cli.commands.add_remote I would allow for any of the snakemake backends to be specified.

And that should be it. Some unit tests would be welcome as well :)