telegraphic / hickle

a HDF5-based python pickle replacement
http://telegraphic.github.io/hickle/
Other
485 stars 70 forks source link

HEP004: Improve support for discipline-specific classes by extras_require in setup.py #148

Closed telegraphic closed 3 years ago

telegraphic commented 3 years ago

Abstract

To minimize dependencies while providing discipline-specific support, we could add extras_requires to setup.py (see setuptools documentation). For example:

pip install hickle
pip install hickle[astronomy]

Would install astronomy-specific dependencies like astropy and pint.

We already have 'lazy import' support, added by @1313e in this commit. This is important as users shouldn't have to load astropy if they don't want to do astronomy!

Motivation

In order to load data from a hickle file, the Python environment needs to have all the dependencies installed that a given class needs. For example, to load an astropy.Quantity needs to load astropy, which is a large package that many users won't have installed. Adding an extras_requires to setup.py would allow a user to quickly install optional dependencies and allow code contributors to maintain discipline-specific support.

Specification

An extras_require is added to the setup.py. For example, here's the code from dask's setup.py:

extras_require = {
    "array": ["numpy >= 1.15.1", "toolz >= 0.8.2"],
    "bag": [
        "cloudpickle >= 0.2.2",
        "fsspec >= 0.6.0",
        "toolz >= 0.8.2",
        "partd >= 0.3.10",
    ],
    "dataframe": [
        "numpy >= 1.15.1",
        "pandas >= 0.25.0",
        "toolz >= 0.8.2",
        "partd >= 0.3.10",
        "fsspec >= 0.6.0",
    ],
    "distributed": ["distributed >= 2.0"],
    "diagnostics": ["bokeh >= 1.0.0, != 2.0.0"],
    "delayed": ["cloudpickle >= 0.2.2", "toolz >= 0.8.2"],
}
extras_require["complete"] = sorted({v for req in extras_require.values() for v in req})

packages = [
    "dask",
    "dask.array",
    "dask.bag",
    "dask.bytes",
    "dask.dataframe",
    "dask.dataframe.io",
    "dask.dataframe.tseries",
    "dask.diagnostics",
]

At the moment, we have optional dependencies astropy and scipy in the requirements_test.txt:

astropy>=1.3,<4.0
scipy>=1.0.0
pandas>=0.24.0

I propose we add these to extras_require instead (and figure out a nice way to get a requirements_test.txt equivalent)

Open Issues

None

References

[1] https://setuptools.readthedocs.io/en/latest/userguide/dependency_management.html?highlight=extras#optional-dependencies

1313e commented 3 years ago

Why would this be necessary? For dumping astropy objects, one already has it installed, so we are good there. For reading them, you need to have it installed to load the proper module in hickle. If you don't have it, hickle will just raise an error telling you that you cannot load that object. You would only have to install a single additional package.

Personally, I think adding optional dependencies to the setup.py file just makes it more confusing. Like, what is easier once you figure out you need astropy? pip install hickle[astronomy] or pip install astropy?

hernot commented 3 years ago

Hm, I'm with @1313e. We here are in medical engineering not astronomy and not using astropy here at all. So far we had no issue with astropy to be required as dependency for hickle. There are other projects and packages which pull a way larger lot of dependencies not relevant to us compared to the mostly harmless hickle. So Unless the number of domain specific package loaders populating the loaders directory explodes i would consider optional dependencies in setup.py related to them at a very low priority.

Further it is not done by making them optional, there would as @1313e indicates, also be required provision that loaders which depend upon optional dependencies are not available, are not installed or request the allowance to install the optional packages through pip. Further what if users later on install optional dependencies how can than theses loaders be activated or installed? The most reasonable option would be to collect them in domain specific loaders, which can be optionally activated as proposed for the compact_expand or the hickle-4.0 loader in the hickle-5-RC pr #149. To be honest I doubt that this is worth the efforts for now with astropy beeing the only such optional dependency package. And even if it would be two or three more it would be for me still not clearly decidable if it is.

telegraphic commented 3 years ago

Ok, seems like a quorum, I'll drop the domain-specific idea. I had this in my head before I realised @1313e had already implemented lazy importing, which is the more important part of the picture.

However, I do still think that it's more appropriate for astropy / scipy to be listed as optional dependencies in extras_requires, as opposed to being treated solely as test requirements. E.g. just do:

extras_require = {
    "astropy": ["astropy"],
    "scipy": ["scipy"]
}

I personally think this is clearer for anyone looking at setup.py. Having them listed in test requirements suggests they are part of the testing suite (e.g. codecov). @1313e I do agree it's not particularly useful for an end user to do pip install hickle[astropy] instead of pip install astropy given the dependency is a single package.

Having said all that, I don't feel too strongly -- I personally don't think either solution in setup.py is optimal for our use case. If I haven't swayed you @1313e and @hernot then let's move on!

1313e commented 3 years ago

I think documentation is a great place for mentioning optional requirements.

hernot commented 3 years ago

Me too. Lets keep this on very low priority list for the day when number of disciplines not requiring all packages for which dedicated loaders are provided by hickle raise and counts of issues related to missing, non or insufficiently unclear documented optional requirements explode. Further i do think as long as hdf5 and thus hickle does not gain major relevance beyond natural sciences and engineering i would not bother about scipy at all and rather consider it as something everybody has installed anyway like Numpy.

telegraphic commented 3 years ago

Closing for now