Closed freemansw1 closed 1 year ago
I think anything that changes the output depending on the order of operations is a potential problem.
It would be good to clarify here what we like to have as the ideal output. Should all three features be removed? Should only the lower threshold value feature be removed when closer than the minimum distance, rather than both? In this hypothetical case, should we aim to remove feature 2 (the middle feature), leaving 1 and 3 as valid, or remove all 3?
A possible quick fix to prevent changes due to order of operations would be to calculate all the minimum differences between features before removing any
Note that we could likely improve both the functionality and performance of the nearest neighbour search by using a KDTree method to look up nearest neighbours: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html
I found the scipy pairwise distance function in the meantime, which could (with a bit of masking an sorting) probably be used to make this independet of the order of operations. The idea of @w-k-jones with a KDTree is probably ab bit more efficient though, this would be a light version of the things trackpy
does under the hood.
The idea of min_distance
is to remove features, that actually belong to a close, more pronounced feature, correct? Then I think that removing 1 and 2 in the example is probably the right way to go, if we assume that 1 actually belongs to 2 and 2 actually belongs to 3
I agree, both 1 and 2 should be removed
Also having looked into this I've noticed a number of problems and bugs linked to this in feature detection:
At present, the filtering selects the feature that meets the largest threshold value, and if they are equal then the feature with the largest number of pixels. If both have the same number of pixels then the second is removed. Do we want to keep it this way, or change it by, for example, using the min/max value of the field within the feature rather than the threshold?
I think the best way to approach this will be to merge the 3D changes as is, then fix this issue afterwards. I'll have a look into how best to do this, it looks like KDTree's query_pairs method will be the best way to go
Now that the quick addition of the target option is mostly done in #244 , I've started looking into implementing a KDTree approach to resolve the rest of this issue. Two further issues have come to light while doing this:
Currently filter_min_distance
solely uses cartesian distance based off of hdim_1
, hdim_2
and zdim
and the dxy
and dz
values. Should we switch to using the projection_x_coordinate
and projection_y_coordinate
by default (removing the need to pass dxy
), and should we also allow the use of lat/lon coordinates and distances instead of cartesian?
@freemansw1 @galexsky has filter_min_distance
been modified for periodic boundary conditions in the v1.5.0 release? If not I think this is possible with the KDTree approach, but will have to look into it
I've moved this to the v1.5 milestone and will close when #249 is merged.
Closing this now that #249 is merged.
From @snilsn: A theoretical case:
Imagine 3 features with increasing threshold values. Feature 1 + Feature 2 and Feature 2 + Feature 3 are closer than the minimum distance, but Feature 1 + Feature 3 are not. If (1, 2) comes before (2, 3) in the indice list, both Feature 1 and Feature 2 are deleted. Is this a potential problem?
_Originally posted by @snilsn in https://github.com/tobac-project/tobac/pull/209#discussion_r1052399898_