sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.37k stars 316 forks source link

Installing SDV in Google Colab Notebook: `ERROR` in pip's dependency resolver #1360

Closed npatki closed 1 year ago

npatki commented 1 year ago

Environment Details

Error Description

When running pip install sdv in my Colab Notebook, I have now started seeing the following error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.15.1+cu118 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible.
torchtext 0.15.1 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible.
torchdata 0.6.0 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible.
torchaudio 2.0.1+cu118 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible.

This affects our demo notebooks, since they are all on Google Colab.

(However, the SDV still appears to run with out issues.)

Root Cause

It appears that Google Colab made some updates as to which libraries are pre-installed.

See Google Colab release notes

Can we update the versions of torch?

Workaround

In the meantime, use the following steps to run the SDV on a Colab Notebook.

  1. Install the SDV pip install sdv and ignore any of the errors that get printed
  2. Restart the Kernel. In Colab, click Runtime in the top menu bar and then Restart runtime (see screenshot below)
  3. Now you can continue with your code. Import SDV and use the features.
image

To see this in action, see any of the demo notebooks

DuelistJeff commented 1 year ago

I have actually hit an error running SDV sdv_model = GaussianCopulaSynthesizer(metadata) ---> 57 sdv_model.fit(df)

ContextualVersionConflict: (torch 2.0.0+cu118 (/usr/local/lib/python3.9/dist-packages), Requirement.parse('torch<2,>=1.8.0; python_version < "3.10"'), {'deepecho', 'ctgan'})

my process has created several sets of data for me before I hit this, so it works to a point but then...

npatki commented 1 year ago

Hi @DuelistJeff -- did you restart the Kernel after installing the SDV? The steps would be:

  1. Install the SDV pip install sdv
  2. Restart the Kernel. In Colab, click Runtime in the top menu bar and then Restart runtime (see screenshot below)
  3. Then you can import sdv and use the features
image
DuelistJeff commented 1 year ago

Hi @npatki,

Yes I did. As I mentioned although it may not have been clear, it was running through several iterations of creating artificial data for me. I have a dataset with large sample of some results and very small amounts of others, so I am using SDV to top up some of the low counts with artificial data to get better training results with my model.

It had made several thousand data samples when it suddenly crashed with the above error. Just out of curiosity I ran the exact same code again and this time it go through and finished all the samples I wanted so the error must have some random factor involved. But it does seem to be with the downgrading of Colab's torch version to install SDV.

npatki commented 1 year ago

Hi @DuelistJeff would you mind filing a new issue related to your problem? The root cause may be related to this one, but it would be nice to have a separate thread to help you.

DuelistJeff commented 1 year ago

Hi @npatki,

I am not sure it is worth it. As I say I have managed to bypass the problem for now. I just wanted to let you know that when you said in the original post that SDV still runs without updating the pytorch version, there is at least one case when it did not.

I would imagine that there will be more people having the same problem running SDV on google colab in the near future until the pytorch version in SDV is updated.

Thanks

shadab75 commented 1 year ago

@npatki Hi i have same problem but i could not fix that can you help me ? i get this errors when i wanna install sdv with pip in colab

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchvision 0.15.1+cu118 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible. torchtext 0.15.1 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible. torchdata 0.6.0 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible. torchaudio 2.0.1+cu118 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible.

npatki commented 1 year ago

Hi @shadab75, in the original comment I observed the error itself did not interfere with much of the functionality:

the SDV still appears to run with out issues.

Have tried running any of the code in the SDV? If not, I'd try this first. If you observe any crashes or error when running any of the synthesizers then let us know.

npatki commented 1 year ago

Note that this issue will likely be resolved by #1365. But I can keep it open in case anyone wants to discuss workarounds in the meantime.

PadariyaDebo commented 1 year ago

Hi @npatki

I am struggling with the sdv installation in Google Colab, as you suggested earlier, I restarted the runtime after installing sdv, and it showed the following error: ModuleNotFoundError: No module named 'sdv.tabular' Could you please advise on this?

npatki commented 1 year ago

Hi @PadariyaDebo, in your case I think the error is correct. If you are using the latest version of SDV, then there is no such module as sdv.tabular. We have renamed it to sdv.single_table. For more information about the API, the SDV docs website.

PadariyaDebo commented 1 year ago

Hi @npatki.

Thank you for suggesting this; however, after installing the latest version of SDV I am getting the following error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.24.3 which is incompatible. tensorflow 2.12.0 requires numpy<1.24,>=1.22, but you have numpy 1.24.3 which is incompatible. torchaudio 2.0.1+cu118 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible. torchdata 0.6.0 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible. torchtext 0.15.1 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible. torchvision 0.15.1+cu118 requires torch==2.0.0, but you have torch 1.13.1 which is incompatible.

npatki commented 1 year ago

@PadariyaDebo the topmost comment of the issue addresses this error. You can also follow the instructions in any of the demo notebooks.

PadariyaDebo commented 1 year ago

@PadariyaDebo the topmost comment of the issue addresses this error. You can also follow the instructions in any of the demo notebooks.

Thanks @npatki

npatki commented 1 year ago

Hi everyone, good news!

Since we've upgraded the version of torch (in #1365), I can confirm that there is no longer a pip Error when using a Colab notebook. So I'm marking this issue as fixed!

Note that Colab will still ask you to restart the Kernel, as there are still some packages that need to be overridden. However, there are no longer any dependency conflicts.

SKKhan commented 3 months ago

Hi, I install the !pip install sdv on google colab. When I run the python script on google colab, I face following error.

ModuleNotFoundError Traceback (most recent call last) in <cell line: 4>() 2 import matplotlib.pyplot as plt 3 import seaborn as sns ----> 4 from sdv.tabular import GaussianCopula 5 from sklearn.metrics import mean_absolute_percentage_error 6 from scipy.stats import ks_2samp

ModuleNotFoundError: No module named 'sdv.tabular'

npatki commented 3 months ago

Hi @SKKhan , the error state you are reporting is different from what this issue is describing. If you continue to have problems, I would recommend you to start a new issue.

Please note that there is no such module as sdv.tabular. Where are you finding instructions to do from sdv.tabular import GaussianCopula? These instructions are for a version of SDV that is over 2 years old now.

I would recommend you to to follow the installation instructions here and reference the troubleshooting instructions on our website if you are having problems.