Closed mseitzer closed 3 months ago
Although I think what I'm suggesting does not work, as the client
group should have a minimal set of dependencies, but I intended to still get the full cluster_utils install when doing
pip install cluster_utils
It would have to be the other way round, with
pip install cluster_utils
installing only the client API, and something like
pip install cluster_utils[all]
installs the server/UI part. This does not seem like a great solution to me :/
The clean solution to that problem would probably be to offer two packages, something like cluster_utils_api
and cluster_utils
, where cluster_utils
depends on cluster_utils_api
. Not sure if people would like that, and it also requires larger changes to packaging (probably would look something like this: https://discuss.python.org/t/multiple-related-programs-one-pyproject-toml-or-multiple-projects/17427/2). What do you think, @luator?
Ah, right, I didn't realize that it wouldn't work like that :(. How about we split into cluster_utils_server
and cluster_utils_client
, then it should be pretty clear which is needed where.
I think splitting is actually a better solution compared to using optional dependencies. The latter would still install all the modules, i.e. they could be imported but would throw errors if the wrong dependency-group is installed.
I'd keep both within the same repository, so maintenance-wise it shouldn't add a lot of overhead.
Sounds good, although not sure on the naming. Might be confusing to have one component called "server" if it is the primary component users interact with. Whereas the "client" part is just used by packages, but never interacted with by the user.
Maybe we should make a poll.
True. My reasoning was that it makes the separation clearer but maybe it adds more confusion in the end.
I thought a bit more about this. In my opinion, splitting into separate packages would add too much maintenance overhead for too little practical gain. So I would suggest one of the following options:
cluster_utils
installs only the client dependencies and for running the server part, one needs to install cluster_utils[some_good_name]
. Should be okay from the user-side assuming we provide good error messages in case it's used wrongly. Would probably require some larger refactoring to really work, though (i.e. move everything that uses server-only dependencies out of shared modules). Once done, it should be easy to maintain.To get some idea of the potential gain: At least on first glance (I might have missed something), it seems the only dependency actually needed for the client API is smart-settings. So I measured the time how long it takes to install all the others (numpy, pandas, scipy, ...):
So there could indeed be quite some gain in avoiding those dependencies (of course assuming the users doesn't depend on them anyway in their own code).
Another import reason to keep the dependency set small is that it reduces the chance of version conflicts for the user.
I agree that full package separation is probably overkill, so approach 2) sounds good to me. I have not checked the set of dependencies for the client though.
Refactoring is anyways something that was on the planned: https://github.com/martius-lab/cluster_utils/issues/45. I think it should not be that much trouble.
I have not checked the set of dependencies for the client though.
I quickly checked: After restructuring done in #118, base
doesn't have any third-party dependencies and client
still only needs "smart_settings".
So everything else could go to the optional "server" group. Though if possible, we should find a more descriptive name (from the user-point-of-view) for it.
As suggested by myself here, I think it would be good to have a client API detached from the rest of the code.
Integration into projects would then just require
I would like to have this in the PyPI milestone such that we have stable dependencies/installation-types from the start.