seung-lab / connected-components-3d

Connected components on discrete and continuous multilabel 3D & 2D images. Handles 26, 18, and 6 connected variants; periodic boundaries (4, 8, & 6)
GNU Lesser General Public License v3.0
367 stars 43 forks source link

perf: use block based decision tree for 4-connected problem #44

Closed william-silversmith closed 4 years ago

william-silversmith commented 4 years ago

I'm not sure if there's something in the literature for this since everyone I've read seems to be focused on 8-connected, but here we try applying techniques such as the Grana et al 2009 paper from the 8 connected problem to the 4 connected problem.

The 4-connected problem using SAUF looks like this:

  C
A B 

B checks A and steals it if a match, then does unify(B,C). If A is missing, B steals C if a match, otherwise new label. You can flip this to get a slightly different decision tree with unify(A,B) instead.

This means that there are a tremendous number of unify(B,C) occurring as a solid block of foreground will require a unify on nearly every pixel. Can we use the BBDT technique to reduce unifies? Yes! Even though we can't exploit the intrinsic 2x2 connectedness of the 8-connected block, the 4-connected block offers something.

Let's start with the simplest version of the expanded problem.

  D E
A B C

The key here is that using the original tree starting with B, you have to do unify(B,D) if B matches D. However, because D is intrinsically connected to E, that means that if C matches B, we can skip unify(C,E). This means we can skip about half the unifies across the image. However, we can also apply this logic in the vertical direction as well.

  G H
F A B
E C D

Here we start at A instead of B. If A matches F, we can skip unify(C,E) if A matches C. We can also skip unify(B,H) if A matches G. We can also skip unify(D,B) if A and C match D.

I haven't perfected the 2x2 version yet, but I have seen an improvement on the 2x1 version from about 210 MVx/sec - 215 MVx/sec to about 213 - 224 MVx/sec (~1-4% better) on a medical image dataset from YACCLAB. Hopefully the perfected 2x2 version will be even better.

Notably, this implementation of BBDT is multi-label competent!

william-silversmith commented 4 years ago

On more careful evaluation with some YACCLAB datasets (medical, fingerprint, tobbaco800), the block based approach is not substantially better and in some cases worse. I'm not sure what I'm doing wrong. This isn't as sophisticated as the optimized approach, but I would have expected to see some improvement.

william-silversmith commented 4 years ago

Things got more interesting again. If you try the worst case for SAUF vs a BBDT (a field of ones), we see substantial improvement.

5000x5000 field of ones: bbdt_4_4: 282.75 MVx/sec bbdt_2_2: 224.36 MVx/sec sauf: 222.78 MVx/sec

william-silversmith commented 4 years ago

Slightly more formal testing. Fingerprint and Medical are datasets from YACCLAB. Unity is a 5k x 5k field of ones.

Algorithm Dataset N MVx/sec Factor
SAUF Fingerprint 100 228 1
SAUF Medical 15 233 1
SAUF Unity (5k x 5k) 200 217 1
SAUF Random 5 97 1
BBDT 2x1 Fingerprint 100 242 1.06
BBDT 2x1 Medical 15 236 1.01
BBDT 2x1 Unity (5k x 5k) 200 215 0.99
BBDT 2x1 Random 5 100 1.03
BBDT 2x2 Fingerprint 100 256 1.12
BBDT 2x2 Medical 15 250 1.07
BBDT 2x2 Unity (5k x 5k) 200 278 1.28
BBDT 2x2 Random 5 105 1.08
william-silversmith commented 4 years ago

Well I guess that seals it. The bbdt 2x2 is pretty good!

william-silversmith commented 4 years ago

This compares to the previous (awful) strategy of using the 6-connected logic.

Algorithm Dataset Sec. MVx/sec Factor
6-SAUF Fingerprints 21.5 232.6 1
6-SAUF Medical 30.0 198.9 1
6-SAUF Unity 26.3 181.2 1
6-SAUF Random Array 12.3 97.2 1
4-BBDT 2x2 Fingerprints 20.6 243.0 1.04
4-BBDT 2x2 Medical 24.0 248.1 1.25
4-BBDT 2x2 Unity 17.3 276.0 1.52
4-BBDT 2x2 Random Array 12.2 98.1 1.01

Raw data:

<function fingerprints at 0x7ff82b4312f0> connectivity=4 N=100
20.610318183899928 243.0 (1.04x)
<function medical at 0x7ff8289aef28> connectivity=4 N=15
24.01355004310708 248.1 (1.25x)
<function unity at 0x7ff82b431268> connectivity=4 N=200
17.2783682346354 276.0 (1.52x)
<function random_array at 0x7ff82b4311e0> connectivity=4 N=5
12.153060674668358 98.1 (1.01x)
<function fingerprints at 0x7ff82b4312f0> connectivity=6 N=100
21.536387205124903 232.6
<function medical at 0x7ff8289aef28> connectivity=6 N=15
29.95720171928506 198.9
<function unity at 0x7ff82b431268> connectivity=6 N=200
26.312411785126734 181.2 (
<function random_array at 0x7ff82b4311e0> connectivity=6 N=5
12.258985757828759 97.2