nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
70.66k stars 7.7k forks source link

Why GPT4All try to access my Photos, Desktop, Download and Documents folders? #1109

Closed cyb3rsalih closed 4 months ago

cyb3rsalih commented 1 year ago

Issue you'd like to raise.

I tried to install gpt4all to my macOS, after successful installation I open the app, during the opening it required to access my Photos, Desktop, Download and Documents forlders. I reject all of them except the Documents folder. After about 1 min. It retry to get access the folders, I give the access only Documents folder, in the end app opened.

I just can't understand why it tried to access this folders.

Suggestion:

App should provide a brief explanation for why it needs to access these files.

aragon5956 commented 1 year ago

Malware ? Maybe to sell your data

anijatsu commented 1 year ago

Since the downloads section in README.md links to an external site: https://gpt4all.io/installers/gpt4all-installer-darwin.dmg to add some details I wanted to report that I have encountered the same issue on macOS 13.4.1 when launching GPT4All.app in the parent directory of Applications: /Applications/GPT4All.app However, it appears to be only an alias for /Applications/gpt4all/bin/gpt4all.app When I began to receive these looping prompts for various accesses I decided to force quit the original launch of the application, and launch the destination .app file directly instead. That has successfully opened the application for me, only prompting LuLu (a firewall) for network access. I was able to download the models and request prompts without issue, so clearly none of these privileges were required.

Since the project doesn't depend on GitHub releases, .app reports to be of version 2.4. The .dmg image had the following hash for me:

sha512sum gpt4all-installer-darwin.dmg
4a8b27b0110f79571c956cc38ce20bba43f365a4652587f3737a3d5327afdabaf4646bbdc718fa8db2ec171546888a65d4a21062d49d969487597a7a87e0883e  gpt4all-installer-darwin.dmg
aragon5956 commented 1 year ago

what say yara rules ?

najamelan commented 1 year ago

I ran the app on linux. Instantly I hear my second hard disk reading like crazy. I opened system monitor and it's the chat binary. I immediately killed it. Don't see any excuse to start randomly reading my file system. Not starting this again unless it's in a vm/sandbox. Very dodgy.

jonascript commented 1 year ago

Experienced the same thing. Very sketchy.

jpeacock29 commented 1 year ago

+1 Same experience. Concerning!

juli303 commented 1 year ago

This issue might be related to the issue #1082

aragon5956 commented 1 year ago

it is a malware report the repo if possible !!

KennethWussmann commented 1 year ago

I took a closer look at the source code of gpt4all to understand why the application is scanning directories upon first startup. Here's what I found:

  1. UI for Adding Folders: In the LocalDocsSettings.qml file, there is a UI component that allows users to add directories. It seems the application is designed to let users select folders that they want the application to scan.

  2. Scanning Documents: In the database.cpp file, there's a function called scanDocuments. This function appears to be responsible for scanning the directories and gathering information about the documents contained within them.

  3. Building SQLite Database: The application is building an SQLite database to store the contents of files from the scanned directories. This is evident in the database.cpp file. It looks for files with certain extensions (such as .txt, .docx) and reads their contents into the database.

From the behavior described, it sounds like the app is attempting to scan common user directories (Photos, Desktop, Downloads, Documents) to populate its database. This is likely so users can ask gpt4all questions about their local files. That's probably also the reason why it gets flagged as malicious by malware scanners.

So not malware, but it would be beneficial for the application to explain to the user why it needs to access these folders. Starting to scan all directories right after installation definitely looks suspicious 😬

najamelan commented 1 year ago

Thanks for looking into it. However, I'm on linux. I have a second HDD that does not have my home or any part of the system installed on it. It's mounted on a directory on the root. When turning on the chat binary, this HDD starts reading like crazy. There is no excuse for crawling a filesystem like that. There is no reason to believe I want to use gpt4all on some random part of the filesystem.

If there is a way to scan certain folders, the app should have a button where I can initiate that. Not just access the FS on startup.

cosmic-snow commented 1 year ago

Alright guys, the conspiracy party is over. 😅

I took a closer look at the source code of gpt4all to understand why the application is scanning directories upon first startup. Here's what I found:

  1. UI for Adding Folders: In the LocalDocsSettings.qml file, there is a UI component that allows users to add directories. It seems the application is designed to let users select folders that they want the application to scan.
  2. Scanning Documents: In the database.cpp file, there's a function called scanDocuments. This function appears to be responsible for scanning the directories and gathering information about the documents contained within them.
  3. Building SQLite Database: The application is building an SQLite database to store the contents of files from the scanned directories. This is evident in the database.cpp file. It looks for files with certain extensions (such as .txt, .docx) and reads their contents into the database.

From the behavior described, it sounds like the app is attempting to scan common user directories (Photos, Desktop, Downloads, Documents) to populate its database. This is likely so users can ask gpt4all questions about their local files. That's probably also the reason why it gets flagged as malicious by malware scanners.

So not malware, but it would be beneficial for the application to explain to the user why it needs to access these folders. Starting to scan all directories right after installation definitely looks suspicious 😬

I have to commend you for actually trying to look into the code and see what's going on. The other guys, not so much. It's really all in there. And if you people are suspicious about the pre-packaged installer, you can download Qt Creator/libraries and build yourself. It won't really change anything but might give you some peace of mind.

The document scanning and database feature thingy is part of the LocalDocs feature (which is optional). See LocalDocs documentation.

In general, the docs are a good place to start understanding the different parts of the project - there are quite a few.

It also accesses some paths in the home directory, partly for settings, partly for logs. Also a default folder for models. Some of it is from Qt itself, some for the chat application.

It can send some data back if you opt into that and it does an online version check. It also supports talking to the OpenAI API: for that you need an OpenAI API key and it obviously has to send and receive data because that's not local. The downloaded models are all run locally, though. Oh and it grabs the models.json file online because that has some the necessary information.

I hope that explains the most important things. If you're still doubtful, you can always join the Discord and ask questions.

If you don't ever want to trust the chat GUI, there are also language bindings, a web API (Docker/Python) and a CLI which are more "lightweight" (it'll still need a lot of system resources).

P.S. I've contributed a few things to the code and docs myself but am not one of the maintainers, nor affiliated with Nomic AI. So take that as you will.

cosmic-snow commented 1 year ago

Thanks for looking into it. However, I'm on linux. I have a second HDD that does not have my home or any part of the system installed on it. It's mounted on a directory on the root. When turning on the chat binary, this HDD starts reading like crazy. There is no excuse for crawling a filesystem like that. There is no reason to believe I want to use gpt4all on some random part of the filesystem.

I'm not aware of any part of the code that would do that. Do you have a swap file on that external disk and not enough RAM? Is the model download path set there for some reason? Maybe strace it to see what's going on?

If there is a way to scan certain folders, the app should have a button where I can initiate that. Not just access the FS on startup.

The mentioned LocalDocs feature is off by default and has that button. So that's probably not it.

manyoso commented 1 year ago

This app is signed by nomic ai and checked and notarized by apple. It is open source. The code is available for everyone to look at. Closing with prejudice.

rhubarb commented 1 year ago

This is happening for me on an m2 macbook pro, with GPT4All downloaded and installed today (Aug 24). So I don't see how it's closed. All was good until I changed the download directory. The UI for the app no longer appears but periodic requests for access to my Documents, Pictures, Downloads, etc appear. Whether I accept them or not the app hangs indefinitely - I force quit it eventually. Then I deleted the app, the shortcut, the Applications/gpt4all folder and the ~/Library/Application Support/nomic.ai folder. And re-installed ... and ... same problem. This time the UI eventually came up, I looked in settings and downloads folder was set to "/" - the root directory!

Not saying that GPT4All is malware - it seems a great little app - but this is a severe bug. Please reopen this issue and fix if you can.

manyoso commented 1 year ago

There is something that went wrong and you need to delete the settings file too.

aragon5956 commented 1 year ago

it’s possible it wouldn’t surprise me !

RobinQu commented 8 months ago

Removed the app right after greedy permission requests. It's just a wrapper of LLM inference ...

RobinQu commented 8 months ago

FYI, after a successful run of uninstaller, some of app icons are still present in launchpad, and no easy way to remove it.

image
manyoso commented 4 months ago

We're now fully signed on Mac and Windows and this is resolved.