Open leondz opened 1 year ago
Hi, I would like to contribute to garak adding this plugin, can you give me further information about how I can implement it? Thank you.
Hey, absolutely!
This will probably need both a probe and a detector. The probe holds prompts and manages interaction with the LLM; the detector looks for the actual credit card numbers.
One way to start is with a detector called something like pii
that has a class CreditCards
or Luhn
that does the check - maybe see garak.detectors.productkey
for inspiration. You could follow that up with a probe module that has a class containing prompts that can elicit the PII; maybe using that paper as inspiration. I think the probes.goodside
module has some fairly straightforward probes that you might be instructive.
Let us know if we can help more!
Hi! Concerning to this detector I'm also developing the related probe based on this issue https://github.com/leondz/garak/issues/237. To better implement it I'm thinking about using a list of possible PII taken as input by the program, is there a way I can do this in a fancy way? And also there is a way I can pass this list of PIIs to the plugin I'm developing? What I thought is: garak "some options here" -names path/to/names.txt -emails path/to/emails.txt and so on for all the PIIs I need to implement the probe. Thank you in advance for the response.
The way to do this is in the existing garak probe
structure, which you'll be able to access from the command line in the normal way. You can check out https://reference.garak.ai/en/latest/contributing.html for some tips.
I think #237 is probably an easier one to get started on - the format of the prompts is explained quite well in that issue and there's further detail in the paper, too
try to get model to generate CCs
luhn check (remove spaces), valid date check, is a CVC given check
are there some test signatures for this? are there CCs in "the data"[1], i.e. will the model replay them?
[1] Pile