Closed guybuk closed 1 week ago
Looking at it from the user's POV, I can think of three types:
A user that knows what they're doing, but want to do their own thing:
A user that knows what they're doing, and they also like the stock implementations (e.g. albumentations, holoviews, pytorch):
bridge-ds[vision/text/...]
A user that doesn't know what they're doing
A developer for bridge-ds
Considering our current situation, it spells out this way:
core - pandas, numpy vision - skimage, holoviews, panel, hvplot, pycocotools, albumentations (NOTE: pycocotools is excluded and will be downloaded for tutorials only), tabulate dev - pytest, pytest-mock, pre-commit, testbook
Closed with #8
Closed with #8
Currently we only have
pip install gataset
andpip install gataset[dev]
. the base dependencies are very very large, including:torch, torchvision, datasets, holoviews, panel, imageio...
and more.In practice, our goal is to provide a "core" dataset library, which will only depend on basic python libraries like
numpy
andpandas
, and additional "modal" requirements like:gataset[vision]
gataset[text]
gataset[audio]
etc.The question remains, then, what should be included in such an extra?
For example, let's look at vision:
albumentations
,holoviews
,panel
, just because our favorite downstream implementations of rendering engines and transforms use these? what if someone wants to implement transforms usingtorchvision
?imageio
? what if someone wants to implement their own image-reading?pycocotools
ortorchvision
? who mainly exist for dataset providers? many users won't even use the providers, since they'll implement their own. And if we addtorchvision
, this forces us to depend ontorch
, and the problem even worsens.