perf: use block based decision tree for 4-connected problem

william-silversmith commented 4 years ago

I'm not sure if there's something in the literature for this since everyone I've read seems to be focused on 8-connected, but here we try applying techniques such as the Grana et al 2009 paper from the 8 connected problem to the 4 connected problem.

The 4-connected problem using SAUF looks like this:

  C
A B

B checks A and steals it if a match, then does unify(B,C). If A is missing, B steals C if a match, otherwise new label. You can flip this to get a slightly different decision tree with unify(A,B) instead.

This means that there are a tremendous number of unify(B,C) occurring as a solid block of foreground will require a unify on nearly every pixel. Can we use the BBDT technique to reduce unifies? Yes! Even though we can't exploit the intrinsic 2x2 connectedness of the 8-connected block, the 4-connected block offers something.

Let's start with the simplest version of the expanded problem.

  D E
A B C

The key here is that using the original tree starting with B, you have to do unify(B,D) if B matches D. However, because D is intrinsically connected to E, that means that if C matches B, we can skip unify(C,E). This means we can skip about half the unifies across the image. However, we can also apply this logic in the vertical direction as well.

  G H
F A B
E C D

Here we start at A instead of B. If A matches F, we can skip unify(C,E) if A matches C. We can also skip unify(B,H) if A matches G. We can also skip unify(D,B) if A and C match D.

I haven't perfected the 2x2 version yet, but I have seen an improvement on the 2x1 version from about 210 MVx/sec - 215 MVx/sec to about 213 - 224 MVx/sec (~1-4% better) on a medical image dataset from YACCLAB. Hopefully the perfected 2x2 version will be even better.

Notably, this implementation of BBDT is multi-label competent!

william-silversmith commented 4 years ago

On more careful evaluation with some YACCLAB datasets (medical, fingerprint, tobbaco800), the block based approach is not substantially better and in some cases worse. I'm not sure what I'm doing wrong. This isn't as sophisticated as the optimized approach, but I would have expected to see some improvement.

william-silversmith commented 4 years ago

Things got more interesting again. If you try the worst case for SAUF vs a BBDT (a field of ones), we see substantial improvement.

5000x5000 field of ones: bbdt_4_4: 282.75 MVx/sec bbdt_2_2: 224.36 MVx/sec sauf: 222.78 MVx/sec

william-silversmith commented 4 years ago

Slightly more formal testing. Fingerprint and Medical are datasets from YACCLAB. Unity is a 5k x 5k field of ones.

Algorithm	Dataset	N	MVx/sec	Factor
SAUF	Fingerprint	100	228	1
SAUF	Medical	15	233	1
SAUF	Unity (5k x 5k)	200	217	1
SAUF	Random	5	97	1
BBDT 2x1	Fingerprint	100	242	1.06
BBDT 2x1	Medical	15	236	1.01
BBDT 2x1	Unity (5k x 5k)	200	215	0.99
BBDT 2x1	Random	5	100	1.03
BBDT 2x2	Fingerprint	100	256	1.12
BBDT 2x2	Medical	15	250	1.07
BBDT 2x2	Unity (5k x 5k)	200	278	1.28
BBDT 2x2	Random	5	105	1.08

william-silversmith commented 4 years ago

Well I guess that seals it. The bbdt 2x2 is pretty good!

william-silversmith commented 4 years ago

This compares to the previous (awful) strategy of using the 6-connected logic.

Algorithm	Dataset	Sec.	MVx/sec	Factor
6-SAUF	Fingerprints	21.5	232.6	1
6-SAUF	Medical	30.0	198.9	1
6-SAUF	Unity	26.3	181.2	1
6-SAUF	Random Array	12.3	97.2	1
4-BBDT 2x2	Fingerprints	20.6	243.0	1.04
4-BBDT 2x2	Medical	24.0	248.1	1.25
4-BBDT 2x2	Unity	17.3	276.0	1.52
4-BBDT 2x2	Random Array	12.2	98.1	1.01

Raw data:

<function fingerprints at 0x7ff82b4312f0> connectivity=4 N=100
20.610318183899928 243.0 (1.04x)
<function medical at 0x7ff8289aef28> connectivity=4 N=15
24.01355004310708 248.1 (1.25x)
<function unity at 0x7ff82b431268> connectivity=4 N=200
17.2783682346354 276.0 (1.52x)
<function random_array at 0x7ff82b4311e0> connectivity=4 N=5
12.153060674668358 98.1 (1.01x)
<function fingerprints at 0x7ff82b4312f0> connectivity=6 N=100
21.536387205124903 232.6
<function medical at 0x7ff8289aef28> connectivity=6 N=15
29.95720171928506 198.9
<function unity at 0x7ff82b431268> connectivity=6 N=200
26.312411785126734 181.2 (
<function random_array at 0x7ff82b4311e0> connectivity=6 N=5
12.258985757828759 97.2

seung-lab / connected-components-3d

perf: use block based decision tree for 4-connected problem #44