Potential performance issue with 2D-array randomization

KasperHesse commented 1 year ago

I've been trying to compare PyVSC and constrainedrandom's performance when it comes to randomization of 2D-arrays. The test case is a "checkerboard randomization", where each field can be 0 or 1, but cannot have the same value as its neighbours.

Consider the following implementations

PyVSC

```python import vsc import timeit @vsc.randobj class Checkers(): def __init__(self, N): self.N = N self.board = [vsc.rand_list_t(vsc.rand_uint8_t(), self.N) for _ in range(self.N)] @vsc.constraint def con_values(self): for row in self.board: with vsc.foreach(row) as c: 0 <= c and c <= 1 @vsc.constraint def constraints(self): for i in range(1, self.N): with vsc.foreach(self.board[i], idx=True) as j: if j == 0: pass (self.board[i][j] != self.board[i-1][j]) and (self.board[i][j] != self.board[i][j-1]) def benchmark(): numRounds = 10 for N in (4, 8, 16, 32, 64, 128, 256): t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals()) print(f"N={N:4d}: Runtime: {t/numRounds}") benchmark() ```

constrainedrandom

```python from constrainedrandom import RandObj import timeit class Checkers(RandObj): def __init__(self, N): super().__init__(max_iterations=1000) self.N = N for i in range(self.N): self.add_rand_var(f"list{i}", domain=range(2), length=self.N, disable_naive_list_solver=True) for i in range(1, self.N): for j in range(1, self.N): self.add_constraint((lambda row1, row2: row1[j] != row2[j]), (f"list{i-1}", f"list{i}")) self.add_constraint((lambda row: row[j-1] != row[j]), (f"list{i}", )) def benchmark(): numRounds = 10 for N in (4, 8, 16, 32, 64, 128, 256): t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals()) print(f"N={N:4d}: Runtime: {t/numRounds}") benchmark() ```

The average runtimes reported for the two on my machine are as follows. The fields labeled "DNF" took so long that I didn't bother waiting for them to complete

N	PyVSC	constrainedrandom
4	0.008	0.001
8	0.029	1.265
16	0.112	45.786
32	0.305	DNF
64	1.208	DNF
128	5.371	DNF
256	22.421	DNF

As you can see, the performance of PyVSC seems vastly superior to the performance of constrainedrandom. However, I am unsure whether my implementation with constrainedrandom is optimal, but I haven't found a better way of implementing 2D-arrays and randomization thereof.

Is there a better way of constraining this problem that would lead to better performance with constrainedrandom?

KasperHesse commented 1 year ago

I noticed that I had set max_iterations=1000 due to some issues with a previous constraint attempt. Removing that parameter seems to speed up the constraint solver, probably because it more quickly stops using the naive solver, but is still markedly slower than PyVSC.

N=   4: Runtime: 0.004006109999863838
N=   8: Runtime: 0.8024601899998742
N=  16: Runtime: 1.7624544599999354
N=  32: Runtime: 7.645777930000077
N=  64: Runtime: 73.60141107

will-keen commented 1 year ago

Hi Kasper,

Thanks very much for raising this. 2D arrays (or lists of lists in python terms) aren't a use case we've yet required when using constrainedrandom internally in Imagination, so it's helpful to get your feedback.

You've raised a valid concern about performance here. In general, constrainedrandom isn't very good at cases where a small number of choices determine the only correct result for a large number of other variables. vsc does do better in these sorts of cases, because it considers the problem in a more interconnected way than constrainedrandom does (which is exactly the thing that slows it down in other cases).

I've tried to investigate this particular case and have written up what I found. Hopefully it's helpful, please let me know your thoughts.

Your checkerboard example

The example you've given is a bit of a corner-case in that the first choice you make for any cell in the checkerboard determines the result of all the other cells. If I choose a 1 or 0 to go anywhere, it effectively determines the results for every other cell immediately. vsc therefore performs better - constrainedrandom always really struggles with this kind of case due to its reliance on randomizing and checking.

I was able to reproduce the behaviour you showed in your original example with the code you provided.

My results from directly running your code:

vsc results

``` N= 4: Runtime: 0.003800742000021273 N= 8: Runtime: 0.01277666030000546 N= 16: Runtime: 0.04529272580039105 N= 32: Runtime: 0.18716902040032438 N= 64: Runtime: 1.0156315014006396 N= 128: Runtime: 4.4076643840002365 N= 256: Runtime: 24.720914476300095 ```

constrainedrandom results

``` N= 4: Runtime: 0.00049386870014132 N= 8: Runtime: 1.050551228599943 N= 16: Runtime: 7.706638661499892 N= 32: Runtime: 47.63703318400003 (DNF for N=64, 128, 256) ```

Making constrainedrandom faster

By removing the max_iterations=1000 and the disable_naive_list_solver=True, I got better results:

constrainedrandom but faster code

``` class Checkers(RandObj): def __init__(self, N): super().__init__() self.N = N for i in range(self.N): self.add_rand_var(f"list{i}", domain=range(2), length=self.N) for i in range(1, self.N): for j in range(1, self.N): self.add_constraint((lambda row1, row2: row1[j] != row2[j]), (f"list{i-1}", f"list{i}")) self.add_constraint((lambda row: row[j-1] != row[j]), (f"list{i}", )) def benchmark(): numRounds = 10 for N in (4, 8, 16, 32, 64, 128, 256): t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals()) print(f"N={N:4d}: Runtime: {t/numRounds}") benchmark() ```

constrainedrandom but faster results

``` N= 4: Runtime: 0.0007822489002137445 N= 8: Runtime: 0.2898493679000239 N= 16: Runtime: 0.1423995040000591 N= 32: Runtime: 0.4734258158998273 N= 64: Runtime: 2.1201752419001423 N= 128: Runtime: 9.272784179399604 N= 256: Runtime: 44.2909998391995 ```

This is still about 2x slower than vsc for N >= 64, but I think you'd agree it's quite a bit better than the "did not finish" behaviour in the original example.

Please can you let me know whether you get comparable results running this on your machine? I'm just running it locally on my laptop.

Personally I'd just put this down as a case where vsc wins in terms of performance. Based on current user experience I would trade off slowness in this kind of case for speed in other cases.

There are ways to make this go much faster by manually optimizing the problem, e.g. by constraining the input space for each row, but at that point you might as well just write it procedurally...

Procedural solution

To be 100% honest with you, the original problem you stated doesn't really sound like a problem well-suited to either vsc or constrainedrandom - there's really only one element to randomize, and it's quite sub-optimal to treat the whole thing as a randomization problem.

If I had that problem in a real software project, I would write something procedural, like this:

Procedural solution

``` import random import timeit class Checkers(): def __init__(self, N): self.N = N self.board = [] def randomize(self): self.board = [] top_left = random.getrandbits(1) for i in range(self.N): self.board.append([None for _ in range(self.N)]) for j in range(self.N): if i == 0 and j == 0: self.board[i][j] = top_left else: if j == 0: self.board[i][j] = 0 if self.board[i-1][j] else 1 else: self.board[i][j] = 0 if self.board[i][j-1] else 1 def benchmark(): numRounds = 10 for N in (4, 8, 16, 32, 64, 128, 256): t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals()) print(f"N={N:4d}: Runtime: {t/numRounds}") benchmark() ```

Procedural results

``` N= 4: Runtime: 3.370100603206083e-06 N= 8: Runtime: 8.603600144851953e-06 N= 16: Runtime: 2.861399989342317e-05 N= 32: Runtime: 9.815220037125982e-05 N= 64: Runtime: 0.00036479769987636247 N= 128: Runtime: 0.0014662143003079109 N= 256: Runtime: 0.005748119200143264 ```

This is orders of magnitude faster than using a constrained random approach for this problem.

A different two-dimensional array problem

I want to engage with the original point about two-dimensional array performance, so let me propose a slightly different case, where I think a constrained randomization problem is a much more user-friendly experience to write.

Let's take the checkerboard example, but let's say each element can have the value 0 to 4, and the sum of each element and all its neighbours must be less than or equal to 4.

I coded this in vsc and in constrainedrandom and got similar results:

vsc code

``` import vsc import timeit @vsc.randobj class Checkers(): def __init__(self, N): self.N = N self.board = [vsc.rand_list_t(vsc.rand_uint8_t(), self.N) for _ in range(self.N)] @vsc.constraint def con_values(self): for row in self.board: with vsc.foreach(row) as c: 0 <= c and c <= 4 @vsc.constraint def constraints(self): for i in range(1, self.N): with vsc.foreach(self.board[i], idx=True) as j: if j == 0: pass (self.board[i][j] + self.board[i-1][j] < 5) and (self.board[i][j] + self.board[i][j-1] < 5) def benchmark(): numRounds = 10 for N in (4, 8, 16, 32, 64, 128, 256): t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals()) print(f"N={N:4d}: Runtime: {t/numRounds}") benchmark() ```

constrainedrandom code

``` from constrainedrandom import RandObj import timeit class Checkers(RandObj): def __init__(self, N): super().__init__() self.N = N for i in range(self.N): self.add_rand_var(f"list{i}", domain=range(5), length=self.N) for i in range(1, self.N): for j in range(1, self.N): self.add_constraint((lambda row1, row2: row1[j] + row2[j] < 5), (f"list{i-1}", f"list{i}")) self.add_constraint((lambda row: row[j-1] + row[j] < 5), (f"list{i}", )) def benchmark(): numRounds = 10 for N in (4, 8, 16, 32, 64, 128, 256): t = timeit.timeit(stmt='c.randomize()', setup=f'c = Checkers({N})', number=numRounds, globals=globals()) print(f"N={N:4d}: Runtime: {t/numRounds}") benchmark() ```

vsc results

``` N= 4: Runtime: 0.005554529099754291 N= 8: Runtime: 0.022379124100552872 N= 16: Runtime: 0.06850886350002838 N= 32: Runtime: 0.30377212050007074 N= 64: Runtime: 1.354998340599559 N= 128: Runtime: 5.976018746299815 N= 256: Runtime: 31.340037856700654 ```

constrainedrandom results

``` N= 4: Runtime: 0.005820018600206822 N= 8: Runtime: 0.000641144199471455 N= 16: Runtime: 0.006594730899814749 N= 32: Runtime: 0.16832221820004634 N= 64: Runtime: 1.5169856319000246 N= 128: Runtime: 6.13921610949983 N= 256: Runtime: 27.28949799520051 ```

As you can see, the results are pretty similar between the two, it's a narrow victory for constrainedrandom but not by a lot, so could be in the noise.

The reason to include this is to show that the class of problem matters a lot, not just the list dimensionality.

Other cases where constrainedrandom performs badly

Constrainedrandom performs badly when a constraint attempts to set a single legal value.

Consider a 32-bit integer, which we want to have the value 5. If we write this as a constrained randomization problem, vsc will beat constrainedrandom every time. In fact, constrainedrandom will fail to produce a result.

vsc code

``` import vsc @vsc.randobj class RandInt: def __init__(self): self.x = vsc.rand_int32_t() @vsc.constraint def val_c(self): self.x == 5 r = RandInt() r.randomize() ```

constrainedrandom code

``` from constrainedrandom import RandObj class RandInt(RandObj): def __init__(self): super().__init__() self.add_rand_var('x', bits=32) self.add_constraint(lambda a : a == 5, ('x')) r = RandInt() r.randomize() ```

If we provide the concrete value to constrainedrandom's randomize() method using randomize(with_values={'x': 5}) instead of adding a constraint, then it will just assign the variable and be done.

But you might also think, why not just create a variable and assign it the value 5? Why do I bring up this case which you wouldn't code as a randomization problem? I guess the point is just to say that constrainedrandom often loses to vsc in these kinds of cases where a variable is effectively assigned a value in a constraint.

I haven't looked to fix this behaviour, mainly because I'd just advise someone not to use randomization in this case.

You might say this is vsc being better at "constraint-based programming", which it is, but isn't really what constrainedrandom intends to be good at.

Summary

There are definitely cases where vsc will always beat constrainedrandom, and if you want a tool that performs "constraint-based programming", then vsc is definitely more complete than constrainedrandom. It's worth noting that vsc is much more user-friendly in handling lists of lists, too. Imagine if you had to do another dimension of lists in constrainedrandom - it would be very painful to define all the variable names!

However, I have optimized it for the cases that we generally use it for internally, which are problems where we want a randomized result. In benchmarking, it's always performed very favourably compared to vsc in the sorts of cases we use it for (e.g. see the instruction generation case in the benchmarks/ directory). (This does remind me I need to overhaul how those benchmarks are defined, as it's very clunky at the moment.)

Maybe it does need to support lists of lists better, though we haven't required this yet. Let me know if you have suggestions for this. For what it's worth, this is only deployed in one area internally and certainly doesn't replace doing constrained random stuff in SystemVerilog.

I hope that helps - please let me know any further thoughts you have, or any other performance cases you find where constrainedrandom is not adequate.

imaginationtech / constrainedrandom