leondz / garak

LLM vulnerability scanner
https://discord.gg/uVch4puUCs
Apache License 2.0
1.31k stars 150 forks source link

align detector output types & results #756

Open leondz opened 3 months ago

leondz commented 3 months ago

It's going to be useful to have items in detector output lists align with detector inputs; then we have a better audit trail.

There are some tensions in this change:

  1. Detector test for use of all_outputs (introduced in #644) prefers detector output to be similar length to detector "input" (attempt.all_outputs)
  2. Test for detector output wants a list of floats
  3. Detectors can't always give a hit/miss result; sometimes a test can't be performed, e.g. if a file is missing, and any negative/positive given may be false. So it's helpful to convey "no result"
  4. We want detector input & output to align
  5. It's not clear how hitloging gets the right item

Proposal: