protectai / rebuff

LLM Prompt Injection Detector
https://playground.rebuff.ai
Apache License 2.0
1.01k stars 67 forks source link

model scope is showing zero instead of 1 #78

Open samya123456 opened 8 months ago

samya123456 commented 8 months ago

Scenario :

User Input: "3+3 =7 "

Output:

{ "error": null, "timestamp": "2023-11-07T15:30:03.303Z", "input": "3+3 =7", "breach": false, "detection": { "heuristicScore": 0, "modelScore": 0, "vectorScore": { "topScore": 0.778234363, "countOverMaxVectorScore": 0 }, "runHeuristicCheck": true, "runVectorCheck": true, "runLanguageModelCheck": true, "maxHeuristicScore": 0.75, "maxVectorScore": 0.9, "maxModelScore": 0.9, "injectionDetected": false }, "output": "Sorry, I'm not allowed to respond to that request.", "canary_word": "", "canary_word_leaked": false }

The model score should be 1 I guess. Can we add this scenario in the prompt as an example?

ristomcgehee commented 6 months ago

What you're seeing here is not Rebuff classifying it as prompt injection but the playground's text to SQL LLM call is responding with "Sorry, I'm not allowed to respond to that request.". So I don't think there's any bug here.