Data exfiltration attack vector (theoretical concern)

The concern here is that OpenCtx increases the surface area for attacks like Markdown image attacks.

The attack would go something like this:

Attackers writes a comment in (untrusted, dependency) code instructing the code completion engine to gather credentials and form a 1x1 markdown image url.
User opens repo, and clicks into the code.
The context window picks up the comment.
The LLM generates the image url as part of other useful code.
The developer accepts the suggestion (with intent to keep only the useful bits).
OpenCtx fetches the url as a preview.
Credentials are leaked.

This remains pretty visible: A remotely attentive developer would see the sketchy link. But by that point we're right of boom.

Note also that when content from remote fetches is used to generate context, you don't need your prompt injection in code comments. It can be far away, hidden on the web. Or even be injected remotely by users: They write the malicious instructions in a form on your company's website, whose contents then get logged, and those logs get added into the context during debugging.

The critical new link in this chain is OpenCtx. It makes it possible to turn malicious text immediately into network fetches, which can either themselves be a vector for injection attacks, or be the exfiltration mechanism.

One should probably mitigate both on the "injection vector" and on the "exfiltration mechanism" side. I have no idea how to mitigate the "injection vector" problem.

The "exfiltration mechanism" side is a bit more amenable. It depends heavily on the provider.

For locked, siloed platforms (e.g. Slack), the attacker would have to get an API key for a Slack instance that the user has installed, and have access to that Slack instance, which narrows the attack to public Slack instances and inside jobs. (The fact that by default VSCode settings are shared across repos does make the shared instance attack marginally more possible. Still hard, though.) These may not need any mitigation.
For locked, open platforms (e.g. GitHub), it may make sense to add some kind of allowlist filter to specify which parts of that service can be interacted with. Then you can limit it to only trusted areas (specific repos or users or organizations). Or just hope that the service is sufficiently restrictive that that is unnecessary.
For totally open platforms (the web), some allowlist is probably essential. The alternative is to ask each time (which VSCode does when users click on links). But the security properties of that are absymal. Users always click allow, without thinking.

Another possible mitigation is to attempt to filter generated code for potentially sensitive data. That's definitely not a 100% solution, but might help. But that's pushing the security issue opened by OpenCtx onto the models and/or autocompletion providers.

I acknowledge this is all theoretical. But probably still worth considering...

sourcegraph / openctx

Data exfiltration attack vector (theoretical concern) #167