Open cfal opened 3 weeks ago
seems like this is a dupe of https://github.com/supermaven-inc/supermaven-nvim/pull/35/files from 3 months ago which isn't even merged. what an incredible lack of urgency for a huge privacy issue.
https://github.com/supermaven-inc/supermaven-nvim/pull/35/files does not address the issue being raised here, as the sm-agent binary automatically includes files in the git repo as part of the context, even if they are not opened.
If a file contains sensitive information it should be included in .gitignore
, as Supermaven does not send .gitignore
files to the server even if they are opened. Alternatively, you could include a .supermavenignore
and globs specified in that file will also not be sent to the server.
This isn't clear from documentation, so I think we could change that...
https://github.com/supermaven-inc/supermaven-nvim/pull/35/files does not address the issue being raised here, as the sm-agent binary automatically includes files in the git repo as part of the context, even if they are not opened.
this also needs to be clearly documented.
If a file contains sensitive information it should be included in
.gitignore
, as Supermaven does not send.gitignore
files to the server even if they are opened. Alternatively, you could include a.supermavenignore
and globs specified in that file will also not be sent to the server.
this is untenable for large or internal repos. imo there should be a way (allowlist and blocklist) to configure which repos to enable.
this is untenable for large or internal repos. imo there should be a way (allowlist and blocklist) to configure which repos to enable.
I think this https://github.com/supermaven-inc/supermaven-nvim/pull/58 could solve it in a programmable way. Check the path of the file, and disable supermaven when needed. Because ignore_filetypes
is very limited. But again, was open for 2 months and not merged yet.
I've merged both PRs mentioned, as they are useful in their own rights, and could seemingly help address some of the privacy concerns here, though as I mentioned earlier these don't address the underlying issue involving sm-agent
, which .supermavenignore
was intended to solve
...though as I mentioned earlier these don't address the underlying issue involving
sm-agent
, which.supermavenignore
was intended to solve
Thank you for confirming because I was wondering the same thing. I believe your point is that the full context of the repository is sent at startup via the sm binary which has nothing to do with the neovim plugin / config? And the only way to prevent things is either in the .gitignore
or .supermavenignore
as the binary respects those by default out of the box (regardless of anything in the neovim plugin).
Do I have this correct @sm-victorw ?
@sm-victorw this brings up two further questions I have been wondering about:
is there any command we can run to see exactly what supermaven is using as context and has sent to the servers?
what if I am in a github repo (cwd) in neovim but open up a buffer with a file from outside the repo? See example:
.env
in it for my repo I am currently in, but I open up a .env file from outside the repo (somewhere else on my hard disk)Thanks in advance for clearing these things up!
Thank you for confirming because I was wondering the same thing. I believe your point is that the full context of the repository is sent at startup via the sm binary which has nothing to do with the neovim plugin / config? And the only way to prevent things is either in the
.gitignore
or.supermavenignore
as the binary respects those by default out of the box (regardless of anything in the neovim plugin).
Yes this is roughly what is happening, though depending on how large the repository is, the context might not include everything. Also note that the context is kept on the server for up to 7 days, as mentioned in the code policy (https://supermaven.com/code-policy)
is there any command we can run to see exactly what supermaven is using as context and has sent to the servers?
There isn't any way currently to see exactly what is being included in the context, if you are interested in what files are eligible to be included the sm-agent
binary, typically located at $HOME/.supermaven/binary/[version]/[platform]-[arch]/sm-agent
can be run with the list-files
command to see what isn't being ignored. e.g. ./sm-agent list-files /path/to/repo
If you are interested in whether or not a file is being ignored, ./sm-agent check-ignore /path/to/file
can be used as well
what if I am in a github repo (cwd) in neovim but open up a buffer with a file from outside the repo?
Whenever a file is sent to the binary, the only .gitignore
/.supermavenignore
considered are the ones inside the repository of the file in question. If you have multiple buffers open they could potentially be following different .gitignore
rules. The .env in your scenario would be uploaded if it isn't part of a git repository. In general files which are not part of a git repository are uploaded when they are edited, with no additional context included.
The lack of control for non-git files is unfortunate, and should have a robust solution. ignore_filetypes
was not intended for this use, and until now wasn't meant to be a privacy related feature. Ideally we will have an allow/blocklist of some kind that does not make this sort of determination based on file type.
Thank you for answering all my questions. Exactly what I needed.
I think the biggest "risk" are the files outside of the git repo. Personal markdown notes, internal docs etc.
Is this something handled in the nvim plugin? If so I wonder if for the time being a super conservative approach of just prompting the user in nvim for any file outside of the git repository asking if they want it uploaded? Since typically these will just be one off files opened up ad-hoc.
Another option that would be nice is a config option to just blanket disable uploading any files outside the git repo (if that's possible).
I think the biggest "risk" are the files outside of the git repo. Personal markdown notes, internal docs etc.
Wouldn't a single .supermavenignore
in $HOME
solve this?
.env
inside repo (ideally this should be ignored because of <repo root>/.gitignore
)~/myNotes/note.md
(this should be ignored because of $HOME/.supermavenignore
)@ahmedelgabri thanks for the response.
The .env
(is just a common example) or any other sensitive info is not always going to be from the same repo root that I am currently cwd at. Often times I am flipping between repos and have to open up common files that would not be under that particular git repo. Based on the response above, it is only covered if the files in the .gitignore are actually within that repo.
On windows it is not common to have your files (like notes etc.) under your "HOME" (I put in quotes because we don't really have a HOME ;) .... it is usually something like USERPROFILE) ... documents / notes are often not under that "HOME" path. But even if they were, I don't know that supermaven is looking that far up the tree looking for a supermaven ignore?
Is there any official documentation on using supermavenignore?
Is there a way for supermaven to just not do this in the first place out of the box or does it have to have this behavior? No one wants their personal information leaked.
@leet0rz Could you specify which behavior you are referring to? The uploading of non-repository files? Or the repository based indexing that the binary performs?
We could probably give the option to have the plugin disabled by default, and require a call to the api (.start()
) before the binary is ever started, or something similar to this. I'm not sure if that's what you're proposing
@leet0rz Could you specify which behavior you are referring to? The uploading of non-repository files? Or the repository based indexing that the binary performs?
We could probably give the option to have the plugin disabled by default, and require a call to the api (
.start()
) before the binary is ever started, or something similar to this. I'm not sure if that's what you're proposing
I mean not entirely sure how this works but this does seem like a major privacy concern, as stated before obviously people will run this in all sorts of notes and would never want their personal information uploaded or leaked in any way and supermaven should not be uploading this sort of information in any way to anything ever. What I heard is that it uploads the entire buffer and I guess sources or creates information or "AI responses" or inputs that we can accept from that? If that is the case, is it possible to do this locally instead of uploading it (which is the privacy concern).
I hope I am doing an ok job explaining this and have actually understood what's going on?
@leet0rz the power comes from uploading. Most laptops are not powerful enough to do the type of processing it does and even if it could our laptops would be burning up high cpu/gpu/ram resources constantly. Also to be clear, this is how most of these AI code tools work including GitHub copilot. The difference is Supermaven is more powerful sending your entire repository to its models (more context). None of those things are the main problem. The main problem really is files that are not in your git repository but that you open in a buffer because those also are being sent up to the servers.
We could probably give the option to have the plugin disabled by default, and require a call to the api (
.start()
) before the binary is ever started...
@sm-victorw I think this would be great as step 1. But I think the other important thing should be changing the default of any files that are not part of your current opened git repository should be opt-in instead of opt-out. By default files outside git repo are not sent to servers unless you white list them... preferably a glob / glob array, or even better a callback function we can configure to return true if we want a file sent to servers (with the file path as an input parameter to the cb function).
Thoughts?
@leet0rz the power comes from uploading. Most laptops are not powerful enough to do the type of processing it does and even if it could our laptops would be burning up high cpu/gpu/ram resources constantly. Also to be clear, this is how most of these AI code tools work including GitHub copilot. The difference is Supermaven is more powerful sending your entire repository to its models (more context). None of those things are the main problem. The main problem really is files that are not in your git repository but that you open in a buffer because those also are being sent up to the servers.
What about usage outside of github when you just use neovim to open personal files, which a lot of us do. Will that still not upload the entire buffer and cause a privacy concern? I mean I use neovim to open any file I want to edit outside of github related things too and if a file with sensitive information I open out of some text document and with supermaven being enabled by default will that not cause said privacy concern?
@leet0rz yes that is the concern we have been discussing in this thread. It is definitely a concern. I was just explaining why the idea of doing anything local just on your machine is not an option.
@leet0rz Yes, both the pull requests mentioned earlier in this issue can help mitigate this issue, but as I mentioned earlier we are going to want a robust and clear approach for letting users specify which files they would like to exclude
@GitMurf @sm-victorw Cool thanks guys.
Another side of the problem is, that if I create some temporary file, I should first update .gitignore
and then can start doing something.
I mean, normally, it is the opposite - I work in project local directory which is "safe", and only when commiting, think what should be commited and what should be gitignored and what should be deleted.
I mean now, if I create any temporary and/or scratch file with some probable secret inside the repo folder, even when nvim runs in different window, e.g. as a script output (I usually do some script > 1.txt
) it will be uploaded to supermaven. And supermaven will "like" that file because it is fresh.
Which is even a worth problem, because many tools "expect" to run from project folder to pick up configuration.
Atm, I think I might do:
# .supermavenignore
*
!*.js
!*.jsx
...
This at least might prevent some surprizes.
As well what might be useful - a GLOBAL IGNORE, somewhere in ~/.supermaven
. Which will be a system-wide set of rules followed by a binary despite if a file is in a git or not in a git repo. Maybe local supermavenignores should override it, maybe not.
I have seen this issue again and again. I stopped using it for a while as it's a big issue. I have files in .gitignore it works well on some projects on some it doesn't care simply sends everything. On VSCode it works much better compared to other IDEs, this problem happens frequently on Jetbrains IDEs. I'm using Goland, you just have to pray for it to skip sometimes. On VSCode it almost always skips
I have seen this issue again and again. I stopped using it for a while as it's a big issue. I have files in .gitignore it works well on some projects on some it doesn't care simply sends everything. On VSCode it works much better compared to other IDEs, this problem happens frequently on Jetbrains IDEs. I'm using Goland, you just have to pray for it to skip sometimes. On VSCode it almost always skips
For me the issue is having to add files to ignore, I don't want to do that. I want non-code files to be ignored by default. I don't want to keep track of and ignoring every file except for my code files, that should be default behavior if it's not.
I have seen this issue again and again. I stopped using it for a while as it's a big issue. I have files in .gitignore it works well on some projects on some it doesn't care simply sends everything. On VSCode it works much better compared to other IDEs, this problem happens frequently on Jetbrains IDEs. I'm using Goland, you just have to pray for it to skip sometimes. On VSCode it almost always skips
Can you elaborate on what you mean it 'skips'? As in you get completions on files which are included in .gitignore
? The intellij and neovim plugins are not responsible for deciding what is or isn't sent to the server, this is determined by the binary sm-agent
which makes that determination based on the file path and any .gitignore
it finds. Until somewhat recently all of these plugins used the same binary so the behavior shouldn't have been different
There is a way to guarantee that binary does use only permitted files on MacOS via sandboxing. This is a native OS feature, thus highly secure and only couple text files needed.
How to do:
create a wrapper for the agent somewhere, e.g.:
#!/bin/sh
sandbox-exec -f /.../supermaven.sb /.../.supermaven/binary/v15/macosx-aarch64/sm-agent "$@"
create a policy
(version 1)
(allow default)
(deny file-read) (allow file-read (literal "/")) (allow file-read (subpath "/System/Volumes/Preboot/Cryptexes/OS")) (allow file-read (subpath "/dev")) (allow file-read (subpath "/Library/Preferences")) (allow file-read (subpath "/usr/share/icu")) (allow file-read (subpath "/private/var/db/timezone")) (allow file-read (subpath "/var"))
(allow file-read* (subpath "/Users/sergey/.supermaven"))
(allow file-read-metadata (subpath "/Users/sergey/projects"))
(allow file-read (regex #"/.git/")) (allow file-read (regex #"/.gitignore$")) (allow file-read* (regex #"/.supermavenignore$"))
(allow file-read (regex #".rb")) (allow file-read (regex #".lua"))
Here first pack is needed to start binary correctly (including all shared system libs), then read its own folder, then read ignores and restrict to ruby/lua.
3. Fork plugin and replace binary to a wrapper (or if you don't wanna fork, use other ways e.g. links)
---
This ^^^ is a fully working template, which I wanted to improve, but don't have time atm. Thus decided to post it AS IS that somebody may have pick it up. When/if I will have more time to work on this, will post updated version.
Beauty of this way, is that compliance is guaranteed by OS sandboxing (at least for binary), plugin is another story it may send whatever directly.
Definitely system libs restrictions should be fine-tuned more, but overall I don't care that much about that part, as this is "normal binary way" something, doesn't relate much to personal sensitive info.
supermaven-nvim adds a TextChanged autocmd here which calls binary:on_update https://github.com/supermaven-inc/supermaven-nvim/blob/d71257f431e190d9236d7f30da4c2d659389e91f/lua/supermaven-nvim/document_listener.lua#L20
BinaryLifecycle:on_update sends everything to stdin which i assume ends up writing to the server (it's a closed source binary that is fetched so i can't easily check): https://github.com/supermaven-inc/supermaven-nvim/blob/d71257f431e190d9236d7f30da4c2d659389e91f/lua/supermaven-nvim/binary/binary_handler.lua#L77
this code path never seems to hit poll_once which is the only place where ignore_filetypes seems to be checked: https://github.com/supermaven-inc/supermaven-nvim/blob/d71257f431e190d9236d7f30da4c2d659389e91f/lua/supermaven-nvim/binary/binary_handler.lua#L293
it seems misleading that
ignore_filetypes
doesn't actually ignore files of that filetype and instead will send everything in every buffer backed by a file.