gretelai / gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.
https://gretel.ai/platform/synthetics
Other
579 stars 87 forks source link

[BUG] `sentencepiece==0.1.97` fails to build on `win-amd64-cpython-311` #168

Open ttronas opened 4 months ago

ttronas commented 4 months ago

Are you reporting a bug or FR?

What version of synthetics are you using?

0.22.9

What would you like to see / What problem are you having?

When trying to install gretel-synthetics via pip, it fails to install. Instead, I receive an error message (see below).
I narrowed it down and am pretty sure that the bug is inside sentencepiece - the Python version I use (Win AMD64 3.11) might not be supported by sentencepiece==0.1.97 - Hint from stackoverflow I tried to install it manually and, indeed, higher versions of sentencepiece (such as 0.2.0) are unproblematic.
Also, if I install gretel-synthetics in a 3.10 environment, it works flawless.
Is it possible to raise the version in your dependencies?
Or, if this isn't possible, give some advice on how to make it work?

Thanks a lot, I appreciate your great tool!
Jonas

Are you using GPU or a CPU?

CPU

What environment are you working in?

conda / venv

What version of python are you using?

3.11

Please provide any tracebacks or error messages you are receiving

pip install gretel-synthetics
Collecting gretel-synthetics
  Using cached gretel_synthetics-0.22.9-py3-none-any.whl.metadata (12 kB)
Collecting category-encoders==2.2.2 (from gretel-synthetics)
  Using cached category_encoders-2.2.2-py2.py3-none-any.whl.metadata (6.8 kB)
Collecting joblib==1.2.0 (from gretel-synthetics)
  Using cached joblib-1.2.0-py3-none-any.whl.metadata (5.3 kB)
Collecting numpy<1.24,>=1.18.0 (from gretel-synthetics)
  Using cached numpy-1.23.5-cp311-cp311-win_amd64.whl.metadata (2.3 kB)
Collecting packaging==21.3 (from gretel-synthetics)
  Using cached packaging-21.3-py3-none-any.whl.metadata (15 kB)
Collecting pandas<2,>=1.1.0 (from gretel-synthetics)
  Using cached pandas-1.5.3-cp311-cp311-win_amd64.whl.metadata (12 kB)
INFO: pip is looking at multiple versions of gretel-synthetics to determine which version is compatible with other requirements. This could take a while.
Collecting gretel-synthetics
  Using cached gretel_synthetics-0.22.8-py3-none-any.whl.metadata (12 kB)
  Using cached gretel_synthetics-0.22.7-py3-none-any.whl.metadata (12 kB)
  Using cached gretel_synthetics-0.22.6-py3-none-any.whl.metadata (12 kB)
  Using cached gretel_synthetics-0.22.5-py3-none-any.whl.metadata (12 kB)
  Using cached gretel_synthetics-0.22.4-py3-none-any.whl.metadata (12 kB)
  Using cached gretel_synthetics-0.22.3-py3-none-any.whl.metadata (12 kB)
  Using cached gretel_synthetics-0.22.2-py3-none-any.whl.metadata (11 kB)
Collecting protobuf<3.20,>=3.9.2 (from gretel-synthetics)
  Using cached protobuf-3.19.6-py2.py3-none-any.whl.metadata (828 bytes)
INFO: pip is still looking at multiple versions of gretel-synthetics to determine which version is compatible with other requirements. This could take a while.
Collecting gretel-synthetics
  Using cached gretel_synthetics-0.22.1-py3-none-any.whl.metadata (11 kB)
  Using cached gretel_synthetics-0.22.0-py3-none-any.whl.metadata (11 kB)
  Using cached gretel_synthetics-0.21.0-py3-none-any.whl.metadata (11 kB)
Collecting loky==2.9.0 (from gretel-synthetics)
  Using cached loky-2.9.0-py2.py3-none-any.whl.metadata (5.1 kB)
Collecting gretel-synthetics
  Using cached gretel_synthetics-0.20.0-py3-none-any.whl.metadata (11 kB)
Collecting numpy>=1.18.0 (from gretel-synthetics)
  Using cached numpy-1.26.4-cp311-cp311-win_amd64.whl.metadata (61 kB)
Collecting pandas>=1.1.0 (from gretel-synthetics)
  Using cached pandas-2.2.2-cp311-cp311-win_amd64.whl.metadata (19 kB)
Collecting sentencepiece==0.1.97 (from gretel-synthetics)
  Using cached sentencepiece-0.1.97.tar.gz (524 kB)
  Preparing metadata (setup.py) ... done
Collecting smart-open<6.0,>=2.1.0 (from gretel-synthetics)
  Using cached smart_open-5.2.1-py3-none-any.whl.metadata (22 kB)
Collecting tensorflow-estimator==2.8 (from gretel-synthetics)
  Using cached tensorflow_estimator-2.8.0-py2.py3-none-any.whl.metadata (1.3 kB)
Collecting tensorflow-privacy==0.7.3 (from gretel-synthetics)
  Using cached tensorflow_privacy-0.7.3-py3-none-any.whl.metadata (609 bytes)
Collecting tensorflow-probability==0.16.0 (from gretel-synthetics)
  Using cached tensorflow_probability-0.16.0-py2.py3-none-any.whl.metadata (13 kB)
Collecting tqdm<5.0 (from gretel-synthetics)
  Using cached tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)
Collecting scikit-learn>=0.20.0 (from category-encoders==2.2.2->gretel-synthetics)
  Using cached scikit_learn-1.4.2-cp311-cp311-win_amd64.whl.metadata (11 kB)
Collecting scipy>=1.0.0 (from category-encoders==2.2.2->gretel-synthetics)
  Using cached scipy-1.13.0-cp311-cp311-win_amd64.whl.metadata (60 kB)
Collecting statsmodels>=0.9.0 (from category-encoders==2.2.2->gretel-synthetics)
  Using cached statsmodels-0.14.2-cp311-cp311-win_amd64.whl.metadata (9.5 kB)
Collecting patsy>=0.5.1 (from category-encoders==2.2.2->gretel-synthetics)
  Using cached patsy-0.5.6-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting cloudpickle (from loky==2.9.0->gretel-synthetics)
  Using cached cloudpickle-3.0.0-py3-none-any.whl.metadata (7.0 kB)
Collecting attrs>=21.2.0 (from tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached attrs-23.2.0-py3-none-any.whl.metadata (9.5 kB)
Collecting mpmath (from tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting dm-tree~=0.1.1 (from tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached dm_tree-0.1.8-cp311-cp311-win_amd64.whl.metadata (2.0 kB)
Collecting tensorflow-datasets>=4.4.0 (from tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached tensorflow_datasets-4.9.4-py3-none-any.whl.metadata (9.2 kB)
Collecting absl-py (from tensorflow-probability==0.16.0->gretel-synthetics)
  Using cached absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting six>=1.10.0 (from tensorflow-probability==0.16.0->gretel-synthetics)
  Using cached six-1.16.0-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting decorator (from tensorflow-probability==0.16.0->gretel-synthetics)
  Using cached decorator-5.1.1-py3-none-any.whl.metadata (4.0 kB)
Collecting gast>=0.3.2 (from tensorflow-probability==0.16.0->gretel-synthetics)
  Using cached gast-0.5.4-py3-none-any.whl.metadata (1.3 kB)
Collecting python-dateutil>=2.8.2 (from pandas>=1.1.0->gretel-synthetics)
  Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting pytz>=2020.1 (from pandas>=1.1.0->gretel-synthetics)
  Using cached pytz-2024.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas>=1.1.0->gretel-synthetics)
  Using cached tzdata-2024.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting colorama (from tqdm<5.0->gretel-synthetics)
  Using cached colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting joblib>=1.2.0 (from scikit-learn>=0.20.0->category-encoders==2.2.2->gretel-synthetics)
  Using cached joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=2.0.0 (from scikit-learn>=0.20.0->category-encoders==2.2.2->gretel-synthetics)
  Using cached threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Collecting packaging>=21.3 (from statsmodels>=0.9.0->category-encoders==2.2.2->gretel-synthetics)
  Using cached packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
Collecting click (from tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting etils>=0.9.0 (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached etils-1.8.0-py3-none-any.whl.metadata (6.4 kB)
Collecting promise (from tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached promise-2.3-py3-none-any.whl
Collecting protobuf>=3.20 (from tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached protobuf-5.26.1-cp310-abi3-win_amd64.whl.metadata (592 bytes)
Collecting psutil (from tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached psutil-5.9.8-cp37-abi3-win_amd64.whl.metadata (22 kB)
Collecting requests>=2.19.0 (from tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting tensorflow-metadata (from tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached tensorflow_metadata-1.15.0-py3-none-any.whl.metadata (2.4 kB)
Collecting termcolor (from tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached termcolor-2.4.0-py3-none-any.whl.metadata (6.1 kB)
Collecting toml (from tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
Collecting wrapt (from tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached wrapt-1.16.0-cp311-cp311-win_amd64.whl.metadata (6.8 kB)
Collecting fsspec (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached fsspec-2024.3.1-py3-none-any.whl.metadata (6.8 kB)
Collecting importlib_resources (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached importlib_resources-6.4.0-py3-none-any.whl.metadata (3.9 kB)
Collecting typing_extensions (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached typing_extensions-4.11.0-py3-none-any.whl.metadata (3.0 kB)
Collecting zipp (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached zipp-3.18.1-py3-none-any.whl.metadata (3.5 kB)
Collecting charset-normalizer<4,>=2 (from requests>=2.19.0->tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached charset_normalizer-3.3.2-cp311-cp311-win_amd64.whl.metadata (34 kB)
Collecting idna<4,>=2.5 (from requests>=2.19.0->tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached idna-3.7-py3-none-any.whl.metadata (9.9 kB)
Collecting urllib3<3,>=1.21.1 (from requests>=2.19.0->tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached urllib3-2.2.1-py3-none-any.whl.metadata (6.4 kB)
Collecting certifi>=2017.4.17 (from requests>=2.19.0->tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached certifi-2024.2.2-py3-none-any.whl.metadata (2.2 kB)
Collecting googleapis-common-protos<2,>=1.56.4 (from tensorflow-metadata->tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached googleapis_common_protos-1.63.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting protobuf>=3.20 (from tensorflow-datasets>=4.4.0->tensorflow-privacy==0.7.3->gretel-synthetics)
  Using cached protobuf-4.25.3-cp310-abi3-win_amd64.whl.metadata (541 bytes)
Using cached gretel_synthetics-0.20.0-py3-none-any.whl (124 kB)
Using cached category_encoders-2.2.2-py2.py3-none-any.whl (80 kB)
Using cached loky-2.9.0-py2.py3-none-any.whl (67 kB)
Using cached tensorflow_estimator-2.8.0-py2.py3-none-any.whl (462 kB)
Using cached tensorflow_privacy-0.7.3-py3-none-any.whl (251 kB)
Using cached tensorflow_probability-0.16.0-py2.py3-none-any.whl (6.3 MB)
Using cached numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB)
Using cached pandas-2.2.2-cp311-cp311-win_amd64.whl (11.6 MB)
Using cached smart_open-5.2.1-py3-none-any.whl (58 kB)
Using cached tqdm-4.66.4-py3-none-any.whl (78 kB)
Using cached attrs-23.2.0-py3-none-any.whl (60 kB)
Using cached cloudpickle-3.0.0-py3-none-any.whl (20 kB)
Using cached dm_tree-0.1.8-cp311-cp311-win_amd64.whl (101 kB)
Using cached gast-0.5.4-py3-none-any.whl (19 kB)
Using cached patsy-0.5.6-py2.py3-none-any.whl (233 kB)
Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Using cached pytz-2024.1-py2.py3-none-any.whl (505 kB)
Using cached scikit_learn-1.4.2-cp311-cp311-win_amd64.whl (10.6 MB)
Using cached scipy-1.13.0-cp311-cp311-win_amd64.whl (46.2 MB)
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Using cached statsmodels-0.14.2-cp311-cp311-win_amd64.whl (9.9 MB)
Using cached tensorflow_datasets-4.9.4-py3-none-any.whl (5.1 MB)
Using cached tzdata-2024.1-py2.py3-none-any.whl (345 kB)
Using cached absl_py-2.1.0-py3-none-any.whl (133 kB)
Using cached colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Using cached decorator-5.1.1-py3-none-any.whl (9.1 kB)
Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Using cached etils-1.8.0-py3-none-any.whl (156 kB)
Using cached joblib-1.4.2-py3-none-any.whl (301 kB)
Using cached packaging-24.0-py3-none-any.whl (53 kB)
Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Using cached threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Using cached click-8.1.7-py3-none-any.whl (97 kB)
Using cached psutil-5.9.8-cp37-abi3-win_amd64.whl (255 kB)
Using cached tensorflow_metadata-1.15.0-py3-none-any.whl (28 kB)
Using cached protobuf-4.25.3-cp310-abi3-win_amd64.whl (413 kB)
Using cached termcolor-2.4.0-py3-none-any.whl (7.7 kB)
Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB)
Using cached wrapt-1.16.0-cp311-cp311-win_amd64.whl (37 kB)
Using cached certifi-2024.2.2-py3-none-any.whl (163 kB)
Using cached charset_normalizer-3.3.2-cp311-cp311-win_amd64.whl (99 kB)
Using cached googleapis_common_protos-1.63.0-py2.py3-none-any.whl (229 kB)
Using cached idna-3.7-py3-none-any.whl (66 kB)
Using cached urllib3-2.2.1-py3-none-any.whl (121 kB)
Using cached fsspec-2024.3.1-py3-none-any.whl (171 kB)
Using cached importlib_resources-6.4.0-py3-none-any.whl (38 kB)
Using cached typing_extensions-4.11.0-py3-none-any.whl (34 kB)
Using cached zipp-3.18.1-py3-none-any.whl (8.2 kB)
Building wheels for collected packages: sentencepiece
  Building wheel for sentencepiece (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [21 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-311
      creating build\lib.win-amd64-cpython-311\sentencepiece
      copying src\sentencepiece/__init__.py -> build\lib.win-amd64-cpython-311\sentencepiece
      copying src\sentencepiece/_version.py -> build\lib.win-amd64-cpython-311\sentencepiece
      copying src\sentencepiece/sentencepiece_model_pb2.py -> build\lib.win-amd64-cpython-311\sentencepiece
      copying src\sentencepiece/sentencepiece_pb2.py -> build\lib.win-amd64-cpython-311\sentencepiece
      running build_ext
      building 'sentencepiece._sentencepiece' extension
      creating build\temp.win-amd64-cpython-311
      creating build\temp.win-amd64-cpython-311\Release
      creating build\temp.win-amd64-cpython-311\Release\src
      creating build\temp.win-amd64-cpython-311\Release\src\sentencepiece
      "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\xx\.conda\envs\gretel\include -IC:\Users\xx\.conda\envs\gretel\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" /EHsc /Tpsrc/sentencepiece/sentencepiece_wrap.cxx /Fobuild\temp.win-amd64-cpython-311\Release\src/sentencepiece/sentencepiece_wrap.obj /std:c++17 /MT /I..\build\root\include
      cl : Command line warning D9025 : overriding '/MD' with '/MT'
      sentencepiece_wrap.cxx
      src/sentencepiece/sentencepiece_wrap.cxx(2822): fatal error C1083: Cannot open include file: 'sentencepiece_processor.h': No such file or directory
      error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.39.33519\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for sentencepiece
  Running setup.py clean for sentencepiece
Failed to build sentencepiece
ERROR: Could not build wheels for sentencepiece, which is required to install pyproject.toml-based projects