Bug: prompt injection attack detector is not totally correct

open-sauced / app

🍕 Insights into your entire open source ecosystem.

https://pizza.new

Apache License 2.0

414 stars 222 forks source link

Bug: prompt injection attack detector is not totally correct #3480

Open a0m0rajab opened 4 months ago

a0m0rajab commented 4 months ago

Describe the bug

I am getting 400 bad request error every now and then when I use the next prompt:

who is the best developer who knows TailwindCSS and used nextjs in his work?

https://github.com/open-sauced/app/assets/18273833/ec581b16-6518-4609-9f12-1d7b7d4be2b4

Steps to reproduce

Method 1:

go to StarSearch
Try to ask few questions then ask: who is the best developer who knows TailwindCSS and used nextjs in his work?

Method 2:

go to StarSearch
ask who is the best developer who knows TailwindCSS and used nextjs in his work?
Repeat the question

github-actions[bot] commented 4 months ago

Thanks for the issue, our team will look into it as soon as possible! If you would like to work on this issue, please wait for us to decide if it's ready. The issue will be ready to work on once we remove the "needs triage" label.

To claim an issue that does not have the "needs triage" label, please leave a comment that says ".take". If you have any questions, please reach out to us on Discord or follow up on the issue itself.

For full info on how to contribute, please check out our contributors guide.

nickytonline commented 4 months ago

I wasn't able to reproduced this issue with either method. See attached videos.

https://github.com/open-sauced/app/assets/833231/e1042951-8e2d-4846-9132-ab73c2a41e8f

https://github.com/open-sauced/app/assets/833231/a975f7b0-bf21-41a8-8a0c-a03f63200a92

a0m0rajab commented 4 months ago

I tried this again and was able to reproduce the issue from the first time I asked the question:

jpmcb commented 4 months ago

I'm able to reproduce - It seems the in his work? part of your question is what's causing it to reject the prompt outright: it's probably being too aggressive. Behind the scenes, we use another AI agent to detect any malicious usage of the service.

Thanks for raising this - very useful and helpful information as we continue to iron out rough edges. I'll tackle looking at this 🕵🏻‍♂️