leondz / garak

LLM vulnerability scanner
https://discord.gg/uVch4puUCs
Apache License 2.0
1.24k stars 144 forks source link

`_config` object contains `state` not suitable for multiprocessing #646

Open jmartin-tech opened 4 months ago

jmartin-tech commented 4 months ago

Overview

Many garak._config namespace variables such as transient, run, reporting are currently populated at state during execution. The management of these values between testing runs for a long running service does not lend to a singleton as then only one of each can exist at a time in the python interpreter. Further since this data is populated into a namespace variable reload of the namespace will not maintain these values.

The behavior of the multiprocessing package is more akin to launching another new python process that uses only the files directly connected to the source objects passed.

645 addresses an instance that exposed this concern.

Current state example

In the case of the probe class and its associated class hierarchy being called the only argument is defined in the same class and state of config is not passed.

Desired state

Any stateful _config likely needs to either be passed or consistently reloaded in new processes. Investigation and understanding of use cases needs occur to consolidate on patterns either in code standards or library framework supporting task execution with all required context.

leondz commented 4 months ago

When addressing this:

leondz commented 1 day ago

How should we triage/roadmap this fella?

jmartin-tech commented 1 day ago

There is some ideation about possibly injecting a shared memory location to sync config across process and thread boundaries, maybe set a goal to introduce this in this quarter.

The idea proposed is for module level code in _config to search for an existing shared memory object that would be created in the parent runtime when _config is locked to start a run.