kevin-hanselman / dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.
https://kevin-hanselman.github.io/dud/
BSD 3-Clause "New" or "Revised" License
183 stars 8 forks source link

`fsspec` filesystem support #211

Open PythonFZ opened 3 months ago

PythonFZ commented 3 months ago

I just found your project with the idea of having a workflow and data versioning like DVC but more freedom to the user and better performance and I'd like to use it in my own code. I'm the developer of the https://github.com/zincware/ZnTrack package, which utilized DVC and provides a high-level Python API to construct workflows (and some more things). We use this tool for our research and found that it often reaches the limits of what DVC is capable off (DVC doesn't like hundreds of stages: https://github.com/IPSProjects/BMIM-BF4/blob/production/dvc.yaml).

As far as I can see dud ships with most of the functionality that I would require to add it as an alternative back end to ZnTrack and hope to see better performance with it. One key component I'm using is the DVCFileSystem, which as far as I can see is not available for dud.

Describe the solution you'd like Provide a read-only https://github.com/fsspec/filesystem_spec interface for dud.

kevin-hanselman commented 3 months ago

Hi, @PythonFZ. I like this idea! I am not very familiar the nuts and bolts of fsspec, so it would take some time to study it and develop an interface. If you have experience with fsspec I'd love to work with you on this.

PythonFZ commented 3 months ago

That sounds great. I have some limited experience with fsspec but none with Go. @NiklasKappel are you also interested?

NiklasKappel commented 3 months ago

I'm interested, but unfortunately I know neither Go nor fsspec. I will take a look at both if I find some time, but probably not before the end of October.