storpipfugl / pykdtree

Fast kd-tree implementation in Python
GNU Lesser General Public License v3.0
215 stars 47 forks source link

Fix mask parameter when input data points are not ordered #30

Closed djhoese closed 6 years ago

djhoese commented 6 years ago

Up until now I hadn't actually used my new-ish mask keyword argument feature. All the tests passed so I assumed it would work. I've now started to use it in my own work and discovered that it doesn't actually work for most real world cases because it has a major bug.

What was actually happening is that my code was getting lucky. My code was checking the query's mask array at index start_idx + i but it should have been using pidx[start_idx + i] which is the actual index of the source data (and the mask array). The previous index I was using was actually the index in the KDTree structure (I think). In all of my tests I was getting lucky because the KDTree happened to place the data points in the same positions in its structure as the original array so both start_idx + i and pidx[start_idx + i] were the same.

I've fixed the code and updated my "fake data" tests so that the input data are in a random order instead of sorted order. The test1d_mask test fails before my changes and passes after. It looks like the real world data in data_pts_real are sorted enough that test3d_mask passes with/without my fix.