martius-lab / cluster_utils

https://cluster-utils.readthedocs.io/stable/
Other
12 stars 1 forks source link

Separate out client API and provide client dependency group #83

Closed mseitzer closed 3 months ago

mseitzer commented 7 months ago

As suggested by myself here, I think it would be good to have a client API detached from the rest of the code.

Integration into projects would then just require

pip install cluster_utils[client]

I would like to have this in the PyPI milestone such that we have stable dependencies/installation-types from the start.

mseitzer commented 7 months ago

Although I think what I'm suggesting does not work, as the client group should have a minimal set of dependencies, but I intended to still get the full cluster_utils install when doing

pip install cluster_utils

It would have to be the other way round, with

pip install cluster_utils

installing only the client API, and something like

pip install cluster_utils[all]

installs the server/UI part. This does not seem like a great solution to me :/

The clean solution to that problem would probably be to offer two packages, something like cluster_utils_api and cluster_utils, where cluster_utils depends on cluster_utils_api. Not sure if people would like that, and it also requires larger changes to packaging (probably would look something like this: https://discuss.python.org/t/multiple-related-programs-one-pyproject-toml-or-multiple-projects/17427/2). What do you think, @luator?

luator commented 7 months ago

Ah, right, I didn't realize that it wouldn't work like that :(. How about we split into cluster_utils_server and cluster_utils_client, then it should be pretty clear which is needed where. I think splitting is actually a better solution compared to using optional dependencies. The latter would still install all the modules, i.e. they could be imported but would throw errors if the wrong dependency-group is installed.

I'd keep both within the same repository, so maintenance-wise it shouldn't add a lot of overhead.

mseitzer commented 7 months ago

Sounds good, although not sure on the naming. Might be confusing to have one component called "server" if it is the primary component users interact with. Whereas the "client" part is just used by packages, but never interacted with by the user.

Maybe we should make a poll.

luator commented 7 months ago

True. My reasoning was that it makes the separation clearer but maybe it adds more confusion in the end.

luator commented 4 months ago

I thought a bit more about this. In my opinion, splitting into separate packages would add too much maintenance overhead for too little practical gain. So I would suggest one of the following options:

  1. Keep things as they are. Easiest to use and maintain but leads to unnecessary packages installed for the user scripts.
  2. Do as suggested in https://github.com/martius-lab/cluster_utils/issues/83#issuecomment-2074247230, i.e. cluster_utils installs only the client dependencies and for running the server part, one needs to install cluster_utils[some_good_name]. Should be okay from the user-side assuming we provide good error messages in case it's used wrongly. Would probably require some larger refactoring to really work, though (i.e. move everything that uses server-only dependencies out of shared modules). Once done, it should be easy to maintain.

To get some idea of the potential gain: At least on first glance (I might have missed something), it seems the only dependency actually needed for the client API is smart-settings. So I measured the time how long it takes to install all the others (numpy, pandas, scipy, ...):

So there could indeed be quite some gain in avoiding those dependencies (of course assuming the users doesn't depend on them anyway in their own code).

mseitzer commented 4 months ago

Another import reason to keep the dependency set small is that it reduces the chance of version conflicts for the user.

I agree that full package separation is probably overkill, so approach 2) sounds good to me. I have not checked the set of dependencies for the client though.

Refactoring is anyways something that was on the planned: https://github.com/martius-lab/cluster_utils/issues/45. I think it should not be that much trouble.

luator commented 4 months ago

I have not checked the set of dependencies for the client though.

I quickly checked: After restructuring done in #118, base doesn't have any third-party dependencies and client still only needs "smart_settings". So everything else could go to the optional "server" group. Though if possible, we should find a more descriptive name (from the user-point-of-view) for it.