leondz / garak

LLM vulnerability scanner
https://discord.gg/uVch4puUCs
Apache License 2.0
1.26k stars 145 forks source link

probe: credit cards #88

Open leondz opened 1 year ago

leondz commented 1 year ago

try to get model to generate CCs

luhn check (remove spaces), valid date check, is a CVC given check

are there some test signatures for this? are there CCs in "the data"[1], i.e. will the model replay them?

[1] Pile

b3dd1 commented 3 weeks ago

Hi, I would like to contribute to garak adding this plugin, can you give me further information about how I can implement it? Thank you.

leondz commented 3 weeks ago

Hey, absolutely!

This will probably need both a probe and a detector. The probe holds prompts and manages interaction with the LLM; the detector looks for the actual credit card numbers.

One way to start is with a detector called something like pii that has a class CreditCards or Luhn that does the check - maybe see garak.detectors.productkey for inspiration. You could follow that up with a probe module that has a class containing prompts that can elicit the PII; maybe using that paper as inspiration. I think the probes.goodside module has some fairly straightforward probes that you might be instructive.

Let us know if we can help more!

b3dd1 commented 2 weeks ago

Hi! Concerning to this detector I'm also developing the related probe based on this issue https://github.com/leondz/garak/issues/237. To better implement it I'm thinking about using a list of possible PII taken as input by the program, is there a way I can do this in a fancy way? And also there is a way I can pass this list of PIIs to the plugin I'm developing? What I thought is: garak "some options here" -names path/to/names.txt -emails path/to/emails.txt and so on for all the PIIs I need to implement the probe. Thank you in advance for the response.

leondz commented 1 week ago

The way to do this is in the existing garak probe structure, which you'll be able to access from the command line in the normal way. You can check out https://reference.garak.ai/en/latest/contributing.html for some tips.

I think #237 is probably an easier one to get started on - the format of the prompts is explained quite well in that issue and there's further detail in the paper, too