rapidsai / dask-cuda

Utilities for Dask and CUDA interactions
https://docs.rapids.ai/api/dask-cuda/stable/
Apache License 2.0
289 stars 92 forks source link

Optional Python dependencies #1054

Open pentschev opened 1 year ago

pentschev commented 1 year ago

In https://github.com/rapidsai/dask-cuda/pull/984#discussion_r1028548527 we have briefly discussed that cuDF not being a hard dependency of Dask-CUDA theoretically may allow older versions to be used. We do not guarantee any sort of backwards compatibility, and thus we have to rely on checking for the availability of specific features or version at runtime. However, it feels it would be appropriate to instead ensure cuDF (if installed) maintains a minimum version, what seems possible according to Setuptools Optional Dependencies docs. In that case, we would simply drop all code that exists to ensure certain functionalities are implemented at runtime.

Is the above a reasonable approach, or are there better alternatives for that?

@jakirkham @vyasr I believe you may have some insights regarding this.

jakirkham commented 1 year ago

Yep we already have some optional dependencies here

Main thing is to come up with a name for that grouping. One suggestion would be rapids. Though could use other names

Might want to include CuPy and anything else here

Edit: Other thing would be to include this in the Conda recipe under run_constrained

pentschev commented 1 year ago

Main thing is to come up with a name for that grouping. One suggestion would be rapids. Though could use other names

I was thinking of something even broader, maybe runtime, but either one is fine by me. But to clarify, we also would need to add that constraint in the the other libraries, e.g., in dask-cudf, as per what the Setuptools docs suggest, right?

Might want to include CuPy and anything else here

I don't think we have or have ever had any constraints regarding CuPy versions, so I don't see an immediate need for that.

Edit: Other thing would be to include this in the Conda recipe under run_constrained

Ah this is nice too, thanks for raising, I wasn't aware of that one.

jakirkham commented 1 year ago

Sure how about extras?

Not sure I'm understanding. We can optional requirements wherever we see fit.

Was more meaning we should list CuPy. Whether we include a lower bound (and what that might be) is a different discussion

pentschev commented 1 year ago

Sure how about extras?

Yes, extras sounds good.

Not sure I'm understanding. We can optional requirements wherever we see fit.

Was more meaning we should list CuPy. Whether we include a lower bound (and what that might be) is a different discussion

I got a bit carried away with the version specifically because I raised this discussion to "enforce" a minimum cuDF version, but you're right, we can add CuPy to extras without a version as well.

vyasr commented 1 year ago

This plan sounds reasonable. One note, conda's run_constrained is a bit stronger (and closer to what we want) than PEP517's optional dependencies. Even with this grouping nothing will prevent a user with an older version of cudf/dask-cudf from installing an incompatible dask-cuda, unless they explicitly specify the extras group pip install dask-cuda[cudf]. pip install dask-cuda would still allow for potentially invalid environments. This is different from run_constrained, which will always validate the version requirements for all installed packages.

pentschev commented 1 year ago

I guess the answer is probably "no", is it possible to implement anything that provides a hard-constraint like run_constrained for wheels as well?

vyasr commented 1 year ago

No, unfortunately not that I am aware of. pip's dependency solver is good enough now that it will correctly detect incompatible versions of required dependencies regardless of installation order (that used to not be the case), but I'm not aware of any way to do this for optional dependencies in the way that conda's run_constrained works.