sourcegraph / openctx

See contextual info about code from your dev tools, in your editor, code review, and anywhere else you read code.
https://openctx.org
Apache License 2.0
149 stars 17 forks source link

Data exfiltration attack vector (theoretical concern) #167

Open josharian opened 4 months ago

josharian commented 4 months ago

The concern here is that OpenCtx increases the surface area for attacks like Markdown image attacks.

The attack would go something like this:

This remains pretty visible: A remotely attentive developer would see the sketchy link. But by that point we're right of boom.

Note also that when content from remote fetches is used to generate context, you don't need your prompt injection in code comments. It can be far away, hidden on the web. Or even be injected remotely by users: They write the malicious instructions in a form on your company's website, whose contents then get logged, and those logs get added into the context during debugging.

The critical new link in this chain is OpenCtx. It makes it possible to turn malicious text immediately into network fetches, which can either themselves be a vector for injection attacks, or be the exfiltration mechanism.

One should probably mitigate both on the "injection vector" and on the "exfiltration mechanism" side. I have no idea how to mitigate the "injection vector" problem.

The "exfiltration mechanism" side is a bit more amenable. It depends heavily on the provider.

Another possible mitigation is to attempt to filter generated code for potentially sensitive data. That's definitely not a 100% solution, but might help. But that's pushing the security issue opened by OpenCtx onto the models and/or autocompletion providers.

I acknowledge this is all theoretical. But probably still worth considering...