stanleybak / nnenum

Neural Network Enumeration Tool
GNU General Public License v3.0
33 stars 17 forks source link

Potential multiprocessing bug #5

Closed sergedurand closed 2 years ago

sergedurand commented 2 years ago

Hi, I would like to signal a potential bug: Running nnenum on ACAS property 7, network 1_9, using the provided property and network files, the analysis finished pretty quickly when using 3 or more processes (~0.3s), but doesn't seem to finish at all when using 1 or 2 processes.

python3 -m nnenum.nnenum examples/acasxu/data/ACASXU_run2a_1_9_batch_2000.onnx examples/acasxu/data/prop_7.vnnlib 30 temp.txt 16

yields

Running in parallel with 16 processes
Found unsafe from second concrete execution of abstract counterexample

violation star found a confirmed counterexample.

Unsafe Base Branch: ++-+++++++-+--++ (Mode: 0)
(0.0 sec) Q: 0, Sets: 0/1  (0.0%) ETA: - (expected 1 stars)   
(0.1 sec) Q: 4, Sets: 3/27  (0.0%) ETA: 2.2098 weeks (expected 33554432 stars)   
(0.2 sec) Q: 4, Sets: 82/267  (0.104%) ETA: 3.72 min (expected 78533 stars)   

Total Stars: 170 (2 exact, 168 approx)
Runtime: 0.3 sec
Completed work frac: 0.0015370170585811138
Num Stars Copied Between Processes: 19
Num Lps During Enumeration: 21
Total Num Lps: 21

Result: network is UNSAFE with confirmed counterexample in result.cinput and result.coutput
Input: [-0.328422874212265, -0.49999991059303284, -0.48877179622650146, 0.30708643794059753, -0.45069652795791626]
Output: [-0.020203180611133575, -0.01962512545287609, -0.019781894981861115, -0.02032020315527916, -0.020042387768626213]

Running with one process:

python3 -m nnenum.nnenum examples/acasxu/data/ACASXU_run2a_1_9_batch_2000.onnx examples/acasxu/data/prop_7.vnnlib 300 temp.txt 1

First 1000 lines of outputs: first.txt Last 1000 lines: last.txt

It searches for ~38s, get stuck in numerical instability (thousands of Warning: numerical instability (primal simplex, phase II) printed), then picks up the search and reaches timeout with a reset:

GLPK timed out / failed (9) after 60.0 sec with primary settings with 44 rows and 12 cols
Retrying with reset
result with reset  (0) 0.0 sec
Retrying with reset + alternate GLPK settings
result with reset & alternate settings (0) 0.0 sec

and a few Note: minimize failed with fail_on_unsat was true, trying to reset basis... Using result after reset basis (soltion was now feasible).

Running with two processes:

python3 -m nnenum.nnenum examples/acasxu/data/ACASXU_run2a_1_9_batch_2000.onnx examples/acasxu/data/prop_7.vnnlib 300 temp.txt 2

It also reaches timeout with a few Note: minimize failed with fail_on_unsat was true, trying to reset basis... Using result after reset basis (soltion was now feasible) but no numerical instability. Full output: 2_proc_prop_7.txt

I get similar results with property 8 network 2_9, except that it seems to search forever with 3 processes too, but manages at 4 and above.

My machine: Ubuntu 20.04 cpu: 11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz python version : 3.6.13 numpy==1.19.5 scipy==1.4.1 threadpoolctl==2.1.0 onnx==1.9.0 onnxruntime==1.8.0 skl2onnx==1.7.0 swiglpk==5.0.3 glpk: 4.65

Is this normal?

stanleybak commented 2 years ago

I've noticed this as well.

The property is SAT, which means that analysis terminates as soon as a counterexample to the property is found. multiprocessing divides up the work among the different processors based on factors that are hard to predict like timing. My interpretation is this seems to get lucky pretty often on this property, whereas a deterministic search (single-process) does not. The numerical instability is a factor of LP solving using GLPK. I'm slowly working on a new implementation with Gurobi, but that won't be ready for some time.