neuml / txtai

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
https://neuml.github.io/txtai
Apache License 2.0
9.55k stars 611 forks source link

"pip install txtai[pipeline]" fails building fasttext #756

Closed DWHowes closed 3 months ago

DWHowes commented 4 months ago

I'm building the Textractor example in VS Code, running inside a conda virtual environment. I get the following error when attempting the line

textractor = Textractor()

Traceback (most recent call last):
  File "d:/Python/NLP/txtAI/extractPDF.py", line 6, in <module>
    textractor = Textractor()
  File "C:\Users\alc77\anaconda3\envs\TextAI\lib\site-packages\txtai\pipeline\data\textractor.py", line 32, in __init__
    raise ImportError('Textractor pipeline is not available - install "pipeline" extra to enable')
ImportError: Textractor pipeline is not available - install "pipeline" extra to enable

I have both tika and java installed in the environment.

In an Anaconda terminal window (standard Windows cmd) I attempt to install the pipeline using "pip install[pipeline]". The installation fails attempting to build the wheel for Fasttext. Traceback is below, any help is greatly appreciated.

Building wheels for collected packages: fasttext
  Building wheel for fasttext (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for fasttext (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [93 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-38
      creating build\lib.win-amd64-cpython-38\fasttext
      copying python\fasttext_module\fasttext\FastText.py -> build\lib.win-amd64-cpython-38\fasttext
      copying python\fasttext_module\fasttext\__init__.py -> build\lib.win-amd64-cpython-38\fasttext
      creating build\lib.win-amd64-cpython-38\fasttext\util
      copying python\fasttext_module\fasttext\util\util.py -> build\lib.win-amd64-cpython-38\fasttext\util
      copying python\fasttext_module\fasttext\util\__init__.py -> build\lib.win-amd64-cpython-38\fasttext\util
      creating build\lib.win-amd64-cpython-38\fasttext\tests
      copying python\fasttext_module\fasttext\tests\test_configurations.py -> build\lib.win-amd64-cpython-38\fasttext\tests
      copying python\fasttext_module\fasttext\tests\test_script.py -> build\lib.win-amd64-cpython-38\fasttext\tests
      copying python\fasttext_module\fasttext\tests\__init__.py -> build\lib.win-amd64-cpython-38\fasttext\tests
      running build_ext
      building 'fasttext_pybind' extension
      creating build\temp.win-amd64-cpython-38
      creating build\temp.win-amd64-cpython-38\Release
      creating build\temp.win-amd64-cpython-38\Release\python
      creating build\temp.win-amd64-cpython-38\Release\python\fasttext_module
      creating build\temp.win-amd64-cpython-38\Release\python\fasttext_module\fasttext
      creating build\temp.win-amd64-cpython-38\Release\python\fasttext_module\fasttext\pybind
      creating build\temp.win-amd64-cpython-38\Release\src
      "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\alc77\AppData\Local\Temp\pip-build-env-zma4dqmj\overlay\Lib\site-packages\pybind11\include -IC:\Users\alc77\AppData\Local\Temp\pip-build-env-zma4dqmj\overlay\Lib\site-packages\pybind11\include -Isrc -IC:\Users\alc77\anaconda3\envs\TextAI\include -IC:\Users\alc77\anaconda3\envs\TextAI\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" /EHsc /Tppython/fasttext_module/fasttext/pybind/fasttext_pybind.cc /Fobuild\temp.win-amd64-cpython-38\Release\python/fasttext_module/fasttext/pybind/fasttext_pybind.obj /EHsc /DVERSION_INFO=\\\"0.9.3\\\"
      fasttext_pybind.cc
      C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\string_view(12): warning STL4038: The contents of <string_view> are available only with C++17 or later.
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(40): error C2039: 'string_view': is not a member of 'std'
      C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\unordered_map(23): note: see declaration of 'std'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(40): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(41): error C2039: 'string_view': is not a member of 'std'
      C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\unordered_map(23): note: see declaration of 'std'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(41): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(46): error C2039: 'string_view': is not a member of 'std'
      C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\unordered_map(23): note: see declaration of 'std'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(46): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(75): error C2039: 'string_view': is not a member of 'std'
      C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\unordered_map(23): note: see declaration of 'std'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(75): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(76): error C2039: 'string_view': is not a member of 'std'
      C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\unordered_map(23): note: see declaration of 'std'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(76): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(78): error C2039: 'string_view': is not a member of 'std'
      C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\unordered_map(23): note: see declaration of 'std'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(78): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(78): error C2535: 'fasttext::entry_type fasttext::Dictionary::getType(int32_t) const': member function already defined or declared
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(77): note: see declaration of 'fasttext::Dictionary::getType'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(91): error C2039: 'string_view': is not a member of 'std'
      C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\unordered_map(23): note: see declaration of 'std'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(91): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(91): error C2146: syntax error: missing ')' before identifier 'str'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(91): error C3646: 'str': unknown override specifier
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(91): error C2059: syntax error: ')'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(91): error C2143: syntax error: missing ';' before 'const'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(91): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(91): error C2208: 'const int': no members defined using this type
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(91): warning C4091: ' ': ignored on left of 'const int' when no variable is declared
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(103): error C2039: 'string_view': is not a member of 'std'
      C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\unordered_map(23): note: see declaration of 'std'
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(103): error C2061: syntax error: identifier 'string_view'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(71): error C2662: 'uint32_t fasttext::Dictionary::hash(const int)': cannot convert 'this' pointer from '_Ty2' to 'fasttext::Dictionary &'
              with
              [
                  _Ty2=const fasttext::Dictionary
              ]
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(71): note: Conversion loses qualifiers
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(91): note: see declaration of 'fasttext::Dictionary::hash'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(71): note: while trying to match the argument list '(std::string)'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(72): error C2665: 'fasttext::Dictionary::getId': no overloaded function could convert all the argument types
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(76): note: could be 'int32_t fasttext::Dictionary::getId(const int,uint32_t) const'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(72): note: 'int32_t fasttext::Dictionary::getId(const int,uint32_t) const': cannot convert argument 1 from 'std::string' to 'const int'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(72): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(72): note: while trying to match the argument list '(std::string, uint32_t)'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(73): error C2664: 'fasttext::entry_type fasttext::Dictionary::getType(int32_t) const': cannot convert argument 1 from 'std::string' to 'int32_t'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(73): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
      C:\Users\alc77\AppData\Local\Temp\pip-install-5d3wudl9\fasttext_14a087acb91b41ccbe41fc2d1df01b34\src\dictionary.h(77): note: see declaration of 'fasttext::Dictionary::getType'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(73): note: while trying to match the argument list '(std::string)'
      C:\Users\alc77\AppData\Local\Temp\pip-build-env-zma4dqmj\overlay\Lib\site-packages\setuptools\dist.py:458: SetuptoolsDeprecationWarning: Invalid dash-separated options
      !!

              ********************************************************************************
              Usage of dash-separated 'description-file' will not be supported in future
              versions. Please use the underscore name 'description_file' instead.

              By 2024-Sep-26, you need to update your project and remove deprecated calls
              or your builds will no longer be supported.

              See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
              ********************************************************************************

      !!
        opt = self.warn_dash_deprecation(opt, section)
      error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.39.33519\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]
davidmezzetti commented 3 months ago

What if you try installing the prior version of fasttext?

pip install fasttext==0.9.2
DWHowes commented 3 months ago

Thanks very much, that worked.

After installing the previous version of fasttext, I successfully installed the pipeline.