Open frenchiveruti opened 11 months ago
So it's a "hack job" but it works if it helps you to solve it. I went into the ingest_folder.py and edited line 78: "path = Path(args.folder)" to "path = Path(args.folder[5:])" to strip out the arg= being added in. Then if I use the command make ingest "C:\Users\import" it works as intended. I've spent a few hours trying to figure out where the arg= is inserted and I just can't.
The arg=
param comes from the Makefile
. However the problem that you are probably facing if you are a Windows user is that you need to set the Args during the call on the command line.
Instead of
make ingest C:\Users\yourUsers\Documents\bulkFolder\
try
make ingest arg=C:\Users\yourUsers\Documents\bulkFolder\
This should solve the troubles caused by windows console while working with the generic solution of the Makefile. For MAC or Linux users the param call is inferred.
Give it a try and let me know if that fix it for u @rar8022 @frenchiveruti
@FrangSierra Sorry, that did not work.
Windows user here with the same issue. I don't have much experience with 'make' but did look at the makefile and was able figure out that it's just running scripts/ingest_folder.py, so I was able to get it to ingest an entire folder by running the following (after activating the venv):
python scripts\ingest_folder.py C:\PATH\TO\PDFS --watch --log-file ingestLog.txt
Although after an unfortunate power loss and restarting the ingest, it seems to be re-ingesting the files it previously ingested. Not sure if 'doubling up' the content will impact anything or if I should just wipe what's been ingested and start from scratch. Not really related to the topic but an observation worth mentioning, I suppose.
I had the same problem on Windows. I tried all the solutions above but none of them worked.
What works for me was to follow @rar8022 approach and deleting also the last characters from args.folder.
I don't know why this happens but now it's working.
These are my modifications to the file _scripts\ingestfolder.py :
path = Path(args.folder[5:-2]) if not path.exists(): raise ValueError(f"Path {args.folder[5:-2]} does not exist")
And this is the command I used:
make ingest "C:\Users\lucabat\Documents\" --
If I added --watch to the command i had to remove 7 more characters
Well, the "not so hack" way is this one:
Ingest documents:
#Missing docx2txt
conda install -c conda-forge docx2txt
poetry run python .\scripts\ingest_folder.py "D:\IngestDataPGPT"
poetry run python -m uvicorn private_gpt.main:app --reload --port 8001
For those of u guys who are doing the hack of removing the first 4 params of the call. U can go to MakeFile
and on the first line u can find:
# Any args passed to the make script, use with $(call args, default_value)
args = `arg="$(filter-out $@,$(MAKECMDGOALS))" && echo $${arg:-${1}}`
This is what is causing that arg
to be read has part of the path and not as an argument. It seems that Windows handles this way different than OSX. the idea of this line is to provide a generic way for 1 or more parameters that may be passed to Make. Probably its also related with your powershell version. Im using 7.0 and Windows 10 Pro. Adding the param name made it work for me has I shared above.
However for those of u that still having the trouble. Instead than doing the Path(args.folder[5:-2]) hack, you can try to tweak the regex of the MakeFile to work properly on Windows. You can find it here: https://github.com/imartinez/privateGPT/blob/main/Makefile
Anyway i realized that is not passing a default_value and the generic regex above is expecting one.
ingest:
@poetry run python scripts/ingest_folder.py $(call args)
Im away for a couple of days but you could try to see if the behaviour changes.
ingest:
@poetry run python scripts/ingest_folder.py $(call args, "") //pass a default value, in this case an empty string.
Anything that you discover, please share it here! So we can improve the documentation of the project to include all this side cases. Have a nice weekend!
Well, the "not so hack" way is this one:
Ingest documents: #Missing docx2txt conda install -c conda-forge docx2txt poetry run python .\scripts\ingest_folder.py "D:\IngestDataPGPT" poetry run python -m uvicorn private_gpt.main:app --reload --port 8001
This is what finally worked for me. I didn't need to install docx2txt first, just running with poetry handled it on my Windows 11 system. THANK YOU!
poetry run python .\scripts\ingest_folder.py "D:\IngestDataPGPT"
Is this issue ever going to get resolved for windows because it does not work.
Traceback (most recent call last):
File "C:\Project-Alice\private-gpt\scripts\ingest_folder.py", line 98, in
any feed back will be greatly helpful.
Per the docs https://docs.privategpt.dev/#section/Ingesting-and-Managing-Documents:
When I run any of these variations:
Instead, I get:
No matter how I phrase the path. And I think it's because the line is being converted to "`arg=(...)" that means, it's adding the
arg=
section without reason.I'm running PrivateGPT on Windows 10 with anaconda via powershell.