sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.37k stars 316 forks source link

Support optional dependencies for CUDA related packages #2173

Open matanitah opened 3 months ago

matanitah commented 3 months ago

Problem Description

"pip install sdv" has a lot of dependencies related to CUDA and Torch that should be optional (like "pip install sdv[no-nvidia]" for example). This would make it a lot easier for organizations to get the Community version running in their own VPCs/images.

Expected behavior

"pip install sdv[no-nvidia]" should download a version of SDV which is CPU only and which does not use CUDA or require all of the NVIDIA libraries.

Additional context

This will make it easier for developers inside companies to demonstrate the value of SDV to their managers,

srinify commented 3 months ago

Hi there @matanitah unfortunately pip doesn't have a way to explicitly exclude specific dependencies (only a way to explicitly include dependencies). So this means we'd have to slim down the dependencies in base SDV (the most common starting point & install path) quite a bit and add CUDA-based packages as a set of optional dependencies. This approach has it's own tradeoffs as well!

To help us better understand, do you mind sharing more context on the barriers you're encountering when trying to "get the Community version running in [your] own VPCs/images" ?

matanitah commented 3 months ago

Sure! The NVIDIA libraries increases the size of the image we have to run quite dramatically, and its really only relevant if we decide to run our SDV code on GPUs, which in our case we have opted to go with CPU anyway. I think having an option like: "pip install sdv[gpu]" would be beneficial because it would keep the size of the image needed to run SDV community code light for those who only want to use CPU, which makes EC2 load times faster and makes it easier for us to keep the cost of infrastructure low.

sdv-team commented 3 months ago

Hi @matanitah It’s great to see your interest in the SDV ecosystem. This comment is a reminder to consult your legal team before adopting the SDV into your project, as SDV has a source-available license.

For more information, you can read through our license FAQs (not legal advice). For any other questions, you can Contact Us. You can also inquire about a commercial license to allow additional use.

srinify commented 3 months ago

That makes sense @matanitah thanks for sharing more context! I'll leave this issue open as a feature request for the team :)

This is similar to this other feature request as well: https://github.com/sdv-dev/SDV/issues/1621