leondz / garak

LLM vulnerability scanner
https://discord.gg/uVch4puUCs
Apache License 2.0
1.32k stars 151 forks source link

LRLBuff manipulates attempt.outputs #791

Open leondz opened 2 months ago

leondz commented 2 months ago
   def untransform(self, attempt: garak.attempt.Attempt) -> garak.attempt.Attempt:
        translator = Translator(self.api_key)
        outputs = attempt.outputs
        attempt.notes["original_responses"] = outputs
        translated_outputs = list()
        for output in outputs:
            response = translator.translate_text(output, target_lang="EN-US")
            translated_output = response.text
            translated_outputs.append(translated_output)
        attempt.outputs = translated_outputs
        return attempt

this disrupts attempt logic - should manipulate attempt.messages or even better have this abstracted away so that external general buff management logic handles the updating