tzaeschke / phtree

PH-Tree
Apache License 2.0
121 stars 22 forks source link

Am I using the best method to get distance between two 3d point clouds? #17

Closed salamanders closed 2 years ago

salamanders commented 4 years ago

Great library, thank you!

I've got 2 3d point clouds of a few thousand points each. There are no correlated "pairs of points", so to find the minimum offset between the two clouds I have to run through 1 cloud, get the single nearest neighbor from the other cloud, and sum up all the distances.

This is slower than expected, I think partially because the phtree search is optimized for returning a series of nodes sorted by distance, rather than "the one closest output node for each input node"

My question: Is there a better way to call this library that reduces overhead, and get the closest node for a whole bunch of starting location nodes?

My code:

fun averageNeighborDistance(other: OrientedCloud) = getNearestNeighbors(other).entries.sumByDouble {
    it.key.distance(it.value)
} / size

fun getNearestNeighbors(other: OrientedCloud): Map<Vector3D, Vector3D> {
    val result = mutableMapOf<Vector3D, Vector3D>()
    oriented.forEach { myPoint ->
        result[myPoint] = other.tree.nearestNeighbour(1, *myPoint.toArray()).next()!!
    }
    return result
}

Those repeated calls to nearestNeighbour(1, is where my profiler says most of the time is spent. My uneducated guess is that some bulk function of "here is a a list of points from cloud0, please return the single nearest neighbor in cloud1 for each one" would possibly be faster.

However, maybe I am doing it the best way, and what looks like overhead with the NodeIteratorFullToList.init call is where the sort-by-distance happens, and isn't something that can be optimized around.

tzaeschke commented 4 years ago

So to understand what you want to achieve, you have to sets of points, A and B, and from these two sets you want to find for each point in A the closest point in B?

One minor optimization could be to reuse the query iterator (may save some allocation and GC cost). I am not sure what programming language you are using, but it may look like this:

fun getNearestNeighbors(other: OrientedCloud): Map<Vector3D, Vector3D> {
    val result = mutableMapOf<Vector3D, Vector3D>()
    val iter = other.tree.nearestNeighbour(1, [any point here])
    oriented.forEach { myPoint ->
        result[myPoint] = other.reset(1, *myPoint.toArray()).next()!!
    }
    return result
}

You could also try using a different structure, such as the CoverTree. They are very slow to build, but very fast to query (especially for high dimensional data, I have no experience with 2D/3D in CoverTrees).

I also discussed a solution here, but it solves the problem of finding the closest point for each point within the same data set. Maybe you can adapt this algorithm for your needs?

salamanders commented 4 years ago

you have to sets of points, A and B, and from these two sets you want to find for each point in A the closest point in B

Spot-on. I'm using Kotlin, so Java is fine! I think you meant iter.reset but I get what you meant. Trying that now.

CoverTree

Cool! Only one cloud (cloudA) is moving, and the other is fixed, so I could get by with investing a lot of up-front time to build something for lookups in cloudB.

tzaeschke commented 4 years ago

My own comment:

I also discussed a solution here, but it solves the problem of finding the closest point for each point within the same data set. Maybe you can adapt this algorithm for your needs?

I realized this solves a different problem: finding a single closest pair. Please ignore this.

salamanders commented 4 years ago

The common interface is great for testing this!

Nit: CoverTree.create vs PHTreeP.createPHTree uses different nomenclature.

Yup, CoverTree builds are 3x slower, and lookups are 3x faster.

fun testSpeed() {
    val phBuilds = mutableListOf<Duration>()
    val ctBuilds = mutableListOf<Duration>()

    val phLookups = mutableListOf<Duration>()
    val ctLookups = mutableListOf<Duration>()

    listOf(100, 500, 1_000, 5_000, 10_000, 50_000).forEach { numPoints ->
        val testPoints = AlignCloudsTest.cloudFile.loadCloudFromAsc()
            .also { println("Original cloud size: ${it.size}") }
            .targetedDecimation(numPoints)
            .also { println("Decimated cloud size: ${it.size}") }

        val ph = PHTreeP.createPHTree<Vector3D>(3)!!
        val ct = CoverTree.create<Vector3D>(3)!!

        phBuilds += measureTime {
            testPoints.oriented.forEach { vec ->
                ph.insert(vec.toArray()!!, vec)
            }
        }

        ctBuilds += measureTime {
            testPoints.oriented.forEach { vec ->
                ct.insert(vec.toArray()!!, vec)
            }
        }

        repeat(10_000) {
            val target = testPoints.randomInRange().toArray()!!
            phLookups += measureTime {
                ph.query1NN(target)
            }
            ctLookups += measureTime {
                ct.query1NN(target)
            }
        }

        println("CoverTree")
        println(" Build: ${ctBuilds.map { it.inMilliseconds }.average().r}ms")
        println(" Lookup: ${ctLookups.map { it.inMilliseconds }.average().r}ms")

        println("PHTreeP")
        println(" Build: ${phBuilds.map { it.inMilliseconds }.average().r}ms")
        println(" Lookup: ${phLookups.map { it.inMilliseconds }.average().r}ms")
    }
}   
tzaeschke commented 4 years ago

Thanks for posting this, very interesting! Another option may be a BallTree, but I do not have a working implementation.