Closed snilsn closed 1 year ago
Thanks for this minimal example! This is an edge case that might have to do with the performance improvement in #597 that was discussed further in #601 .
Is the output of your script consistent from run to run? If not, we identified there that setting the PYTHONHASHSEED
environment variable before Python starts could make the behavior stable. Is that something you can try?
We never quite settled the question of whether to make this configurable, as proposed in #657 , and so I ultimately decided to follow "YAGNI". But your own thoughts would be appreciated!
(The other possibility is that this an instability in how cKDTree handles degeneracy. That might be much harder to overcome. An easy thing to try would be to run the linker with the argument neighbor_strategy='BTree'
.)
The output of the script differs from run to run, but setting PYTHONHASHSEED
beforehand indeed provides stability (both options are still present, but in the same order every run). So thanks for this suggestion!
I encountered this while designing tutorials for another python module that uses some trackpy
functionality and it caused some headaches, but it is obviously a pretty rare case in natural datasets. If it occurs it could lead to some small, but hard to detect reproducibility problems, especially if the linking is part of a larger analysis.
Additionally, there seems to be no comfortable way to set PYTHONHASHSEED
in Jupyter.
Good! I suspect that trackpy v0.4.x would have had the same inconsistent behavior for degenerate candidates… it previously sorted candidates by distance only, which for degenerate candidates would have changed nothing.
In any case, I don't think we even considered degeneracy in our earlier discussion. I can see now how it might arise even in real datasets, if positions can only be determined to the nearest pixel. It seems like the most correct behavior would be to issue a warning—from a scientific perspective, it's bad for trackpy to silently inject an arbitrary choice into your results, whether that choice is consistent or not. However, properly checking for degeneracy has to be done during linking, and it would certainly hinder performance in every other case, so it would have to be optional.
I'm going to close this and leave it as a reference (or warning!) for future users, unless there's a simpler solution I'm not seeing. Thanks again for so perfectly documenting this behavior, @snilsn !
After a bit of consideration I think I have to bother you again, @nkeim
There are two more questions I have:
import trackpy as tp
import pandas as pd
import matplotlib.pyplot as plt
d = {'frame': [2, 2, 1], 'x': [0, 2, 1], 'y': [0, 0, 1]}
df = pd.DataFrame(data = d)
fig, ax = plt.subplots(ncols = 3, nrows = 3, figsize = (10, 10), sharex=True, sharey=True)
for axes in ax.flatten():
track = tp.link(df, 10)
tp.plot_traj(track,
ax = axes,
plot_style={'marker':'x'}
)
plt.show()
import trackpy as tp
import pandas as pd
import matplotlib.pyplot as plt
d = {'frame': [1, 1, 2], 'x': [0, 2, 1], 'y': [0, 0, 1]}
df = pd.DataFrame(data = d)
lengths = []
for axes in range(5000):
track = tp.link(df, 10)
lengths.append(len(track.where(track['particle']==0).dropna()))
plt.hist(lengths, bins=[0.8, 1.2, 1.8, 2.2])
plt.xticks([1, 2])
plt.xlabel('lifetime in frames')
plt.ylabel('count')
plt.title('particle 0')
Hello everybody! During experiments with artificial datasets, I have encountered a situation where I am not able to achieve reproducible behavior of trackpy. This happens when there are two particles in a frame, but only one particle in the next frame, with equal distance to both particles.
There are two options for linking in such a case and both are equally correct (or incorrect) without using any predictors, but it seems that it is random which option is chosen by trackpy. Is there any way to achieve reproducibility here, e.g. by setting a random seed?
I'm using trackpy v0.5.0. A minimal example: