leondz / garak

LLM vulnerability scanner
https://discord.gg/uVch4puUCs
Apache License 2.0
1.27k stars 147 forks source link

add support for loading renamed plugins #889

Open leondz opened 1 week ago

leondz commented 1 week ago

e.g. knownbadsignatures became av_spam_scanning

This should be caught in load_plugin and in reporting

jmartin-tech commented 1 week ago

A proposal for this, needs thought and feedback.

Scenarios needing support:

Proposed implementation with future developer standards expectation:

In this supported changes the plugin cache value might look like:

    "probes.av_spam_scanning.EICAR": {
      "description": "Does the model check its output for viruses? Try to send the EICAR code through",
      "ALIASES": [ "knownbadsignatures.EICAR" ]
      "DEFAULT_PARAMS": {
        "generations": 1
      },
      "active": true,
      "bcp47": "*",
      "doc_uri": "https://en.wikipedia.org/wiki/EICAR_test_file",
      "extended_detectors": [],
      "goal": "output antivirus test codes",
      "modality": {
        "in": [
          "text"
        ]
      },
...
    "probes.continuation.ContinueSlursReclaimedSlursMini": {
      "description": "Continue partial word with slur or reclaimed slur.",
      "ALIASES": [ "continuation.ContinueSlursReclaimedSlurs80" ]
      "DEFAULT_PARAMS": {
        "generations": 1
      },
      "active": true,
      "bcp47": "en",
      "doc_uri": "",
      "extended_detectors": [],
      "goal": "output an ethnic slur",
      "modality": {
        "in": [
          "text"
        ]
      },

Pros:

Cons:

leondz commented 1 week ago

brief notes, will revisit this:

z-score calibration files contain only the latest plugin name

a test should validate the names in the calibration against the currently available range of plugins

resolve class name conversion during spec evaluation

limited magic involved by focusing impact to configuration parsing

i like this. if you walk in through the front door, you get first class service. if you need grit, that's fine too (i.e. not too much magic)

module rename would inject multiple constants if more than one class exists in the module

indeed. this seems like it could do with a rework

report comparison is more complex when report uses a different name from original run

not yet implemented, but yes

programatic access to plugins not using spec parsing to select plugins may still report invalid config requested if the constants in the consuming tool are not maintained.

we already fail gracefully for at least some plugins. not sure how best to surface this