Open vamsinilla opened 1 year ago
Hi @vamsinilla, nice to meet you and very good question. The SDV package itself is only a few KB so I expect that most of the bloat is coming from our external dependencies. I'm not 100% sure, but it seems to me that torch
is probably what's causing the biggest bloat.
Unfortunately, there isn't a good way to isolate these dependencies within the SDV right now. For example, if you uninstall torch
, then importing GaussianCopulaSynthesizer
will crash even if it's not a strict dependency. (The import path will check to make sure all dependencies are there, not just those for Gaussian Copula.)
I propose we can turn this issue into a feature request for the ability to selectively install libraries for specific synthesizers only. To help us prioritize, I would love to hear more about what you are working on. Why are you running into the 5gb limit?
I tried to uninstall torch and then import the GaussianCopulaSynthesizers.
%pip uninstall torch -y
from sdv.single_table import GaussianCopulaSynthesizer
This led to a ModuleNotFoundError
, traceback found below.
Changing this to a feature request and updating the title to clarify.
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
i'm trying a sample application using sdv and deploying it in azure free trial. the artifact size is increased to 5 gb when installing all dependencies. how to reduce the size of the artifact where i'm using only gaussian coupla single table synthesizers
Steps to reproduce
<Replace this text with a description of the steps that anyone can follow to reproduce the error. If the error happens only on a specific dataset, please consider attaching some example data to the issue so that others can use it to reproduce the error.>