Open rabernat opened 2 years ago
Should be easy, yes, though as you said in another thread, the ML notebooks/images should ideally be ~strictly~ anchored around a framework. For now, the current frameworks are tensorflow and pytorch. There will likely be pulling and shoving between the heavyweights if we keep adding stuff in. To keep it in line of pytorch--tensorflow separation, I would think having a RAPIDS image might be more sensible. As I mentioned in the latter part of this comment, I think following NGC containers may be the most sustainable in the future, though to be honest/clear, I have no idea what the larger scope of these images will be... so I could be misunderstanding things!
Adding cupy to #345
There's been some interest at https://github.com/xarray-contrib/xbatcher/issues/87 on having a GPU direct storage demo which uses cupy-xarray
and kvikIO
(a RAPIDS AI library), and it would be good to have a working docker image with CuPy (that can be readily deployed on some cloud provider) to showcase the functionality.
So, my question is: what's the conclusion of the experimental 'super-ml-notebook' docker container PRs at https://github.com/pangeo-data/pangeo-docker-images/pull/345#issuecomment-1157711813 and #369? Is the idea to:
Personally, I would vote for a separate docker image with CuPy, CuML, kvikIO, and other RAPIDS AI libraries, but I also acknowledge that having 3 separate ML docker images can be a maintenance burden as mentioned in #188. Still unsure if having a 3 in 1 image is any easier :laughing:, but wanted to hear people's thoughts on what's a good medium term (<1 year) plan.
It should be easy to support cupy in both ml notebooks, no?