Closed hktalent closed 1 year ago
pip install pandoc
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: pandoc in /Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages (2.3)
Requirement already satisfied: plumbum in /Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages (from pandoc) (1.8.1)
Requirement already satisfied: ply in /Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages (from pandoc) (3.11)
(privateGPT) 51pwn@123-2 privateGPT $ pip install pypandoc
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: pypandoc in /Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages (1.11)
I ran into the same issue. You have to install pandoc, and add it to your PATH.
I'm on Win10. I ran the following py script. It will download and run the pandoc installer. Then add "C:\Users\Username\AppData\Local\Pandoc" to your PATH. That's where mine got installed. Yours might be different.
from pypandoc.pandoc_download import download_pandoc
# see the documentation how to customize the installation path
# but be aware that you then need to include it in the `PATH`
download_pandoc()
@yousifalyousifi thanks
$ python ingest.py
Creating new vectorstore
Loading documents from source_documents
Loading new documents: 3%|▌ | 4/131 [00:06<02:43, 1.29s/it][nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /Users/51pwn/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /Users/51pwn/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /Users/51pwn/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
Loading new documents: 3%|▌ | 4/131 [00:09<04:57, 2.34s/it]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Users/51pwn/MyWork/privateGPT/ingest.py", line 89, in load_single_document
return loader.load()[0]
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/langchain/document_loaders/unstructured.py", line 70, in load
elements = self._get_elements()
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/langchain/document_loaders/epub.py", line 22, in _get_elements
return partition_epub(filename=self.file_path, **self.unstructured_kwargs)
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/unstructured/partition/epub.py", line 24, in partition_epub
return convert_and_partition_html(
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/unstructured/partition/html.py", line 124, in convert_and_partition_html
html_text = convert_file_to_html_text(source_format=source_format, filename=filename, file=file)
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/unstructured/file_utils/file_conversion.py", line 44, in convert_file_to_html_text
html_text = convert_file_to_text(
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/unstructured/file_utils/file_conversion.py", line 12, in convert_file_to_text
text = pypandoc.convert_file(filename, target_format, format=source_format)
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/pypandoc/__init__.py", line 164, in convert_file
format = _identify_format_from_path(discovered_source_files[0], format)
IndexError: list index out of range
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/51pwn/MyWork/privateGPT/ingest.py", line 167, in <module>
main()
File "/Users/51pwn/MyWork/privateGPT/ingest.py", line 157, in main
texts = process_documents()
File "/Users/51pwn/MyWork/privateGPT/ingest.py", line 119, in process_documents
documents = load_documents(source_directory, ignored_files)
File "/Users/51pwn/MyWork/privateGPT/ingest.py", line 108, in load_documents
for i, doc in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/multiprocessing/pool.py", line 873, in next
raise value
IndexError: list index out of range
Can someone help me?
I started it in tmux
python privateGPT.py
Then the control character appears, I don't know how to close it
But the interaction inside Python when I start it alone is normal, indicating that the environment should be fine
Doesn't it seem to support Chinese?
And then he found out that his answer was very confusing, it was just the best in the trash
Is there a solution?
I got the same exception on Pop!_OS 22.04 (which is based on Ubuntu, which is based on Debian, so this should work on all systems based on those). I solved it by installing pandoc on the system:
sudo apt install pandoc
Doesn't it seem to support Chinese? And then he found out that his answer was very confusing, it was just the best in the trash
Is there a solution?
This is not a pandoc issue anymore. I would advise closing this issue and opening a new one.
I've installed yesterday and still get the error. My computer: Mac Pro M2 Python 3.10.8
OSError: No pandoc was found: either install pandoc and add it
to your PATH or or call pypandoc.download_pandoc(...) or
install pypandoc wheels with included pandoc.
I'm already install pandoc or pypandoc.
It's work when running with data test but for an epub file in the source_document
.
I've found solution. I need to install pandoc with brew
first
brew install pandoc
More details: https://pandoc.org/installing.html
I got the same exception on Pop!_OS 22.04 (which is based on Ubuntu, which is based on Debian, so this should work on all systems based on those). I solved it by installing pandoc on the system:
sudo apt install pandoc
This is the right approach on linux. 'pip install pandoc' isn't sufficient.
How is this issue closed if it still an issue on linux?
How is this issue closed if it still an issue on linux?
Perhaps because the errors are caused by pandoc not being installed on the system, which makes it a user error, rather than a privateGPT error.
Oops my bad, I borked the venv. Sorry.
The installation instructions should add:
python3 -m venv ./venv
source ./venv/bin/activate
before pip install -r requirements.txt. This would save a lot of hassle.
I can run normally on Macos Inteli7, and this is my operation to share with everyone
However, I found that using it to try making AI search engines still falls far short of expectations
conda remove --name privateGPT --all -y
conda create -n privateGPT -y python=3.10
conda activate privateGPT
conda init zsh
export PATH="$HOME/anaconda3/envs/privateGPT/bin:$PATH"
which pip python
python -V
cat requirements.txt|xargs -I % pip install "%" -i https://mirror.baidu.com/pypi/simple
ARCHFLAGS="-arch x86_64"
pip install langchain llama-cpp-python chromadb unstructured -i https://mirror.baidu.com/pypi/simple
conda install -c conda-forge pypandoc
brew install pandoc
Problem solved with :
pip install pypandoc-binary
(no need to worry about the PATH with this command)
pip install pypandoc-binary
Great! It works for me! Thanks
pip install pypandoc-binary
did not work for me. Ran into this problem on Debian based docker container
Macos 13.4/ intel i7
python -V Python 3.10.11
$ pip list
$ python ingest.py