Open samya123456 opened 8 months ago
What you're seeing here is not Rebuff classifying it as prompt injection but the playground's text to SQL LLM call is responding with "Sorry, I'm not allowed to respond to that request.". So I don't think there's any bug here.
Scenario :
User Input: "3+3 =7 "
Output:
{ "error": null, "timestamp": "2023-11-07T15:30:03.303Z", "input": "3+3 =7", "breach": false, "detection": { "heuristicScore": 0, "modelScore": 0, "vectorScore": { "topScore": 0.778234363, "countOverMaxVectorScore": 0 }, "runHeuristicCheck": true, "runVectorCheck": true, "runLanguageModelCheck": true, "maxHeuristicScore": 0.75, "maxVectorScore": 0.9, "maxModelScore": 0.9, "injectionDetected": false }, "output": "Sorry, I'm not allowed to respond to that request.", "canary_word": "", "canary_word_leaked": false }
The model score should be 1 I guess. Can we add this scenario in the prompt as an example?