opendatahub-io-contrib / workbench-images

Various custom Workbenches and Runtimes for Open Data Hub and OpenShift Data Science
MIT License
35 stars 23 forks source link

data science snippet includes packages like nvidia* that blow up image to 6GB even when no Torch or Cuda / GPU is needed #48

Open shalberd opened 9 months ago

shalberd commented 9 months ago

I noticed that with the latest addition of packages on October 25 to the bundle 2-datascience snippet, image size has increased, when using the regular base image, to 6B from 1.3 GB before.

Is that intended?

https://github.com/opendatahub-io-contrib/workbench-images/blob/main/snippets/bundles/2-datascience/py39/requirements.txt

I noticed that torch dependencies, nvidia stuff are all added during a normal image build.

I don't know, wondering whether for example the nvidia* packages could be added conditionally only in case torch or Tensorflow or Cuda or GPU is needed / selected?

shalberd commented 9 months ago

Ah, Codeflare, ok. Mmh, maybe a different bundle snippet instead of 2-datascience would be good for those cases ... i.e. 10-codeflare or so ...

@guimou wrote:

Yeah, the issue comes from the codeflare sdk, which has dependencies on pytorch, then nvidia... For contrib I wanted to introduce codeflare in all the images, as well as Elyra/kfp-tekton, but I'm not sure it's a good idea. I talked to the Codeflare team, they were supposed to review their dependencies, I did not check yet if they managed to have less heavy dependencies. So I guess it will come down to another discussion: which images should include Codeflare sdk?