Open Inzlinger opened 4 years ago
Hi Jakob,
nice catch! I don't think this was intended. What happens is, while in the non-distributed case ht.unique(a) = torch.unique(a, sorted=False)
, for the distributed case we're using a different algorithm. It gets even more interesting:
>>> a = torch.tensor([1, 0, 4, 8, 2, 2, 3, 0, 7, 9, 9])
>>> b = ht.array(a, split=0)
>>> c = ht.array(a)
>>> ht.unique(b)
tensor([1, 0, 4, 8, 3, 2, 7, 9])
>>> ht.unique(c) # same as torch.unique(a, sorted=False)
tensor([9, 7, 3, 2, 8, 4, 0, 1])
At the moment, the way to get consistent results is to always specify sorted=True
.
In fact, both numpy and torch return a sorted unique array/tensor by default, so the quick fix for consistency would be to set the default value of sorted
to True
for heat as well. I suppose it was set to False because, in the distributed case, sorting the unique tensor comes with significant overhead. Maybe @TheSlimvReal wants to chime in.
unique
has been reworked and this issue will be fixed with #749. Sorry it's taking so long.
Reviewed within #1109
Feature functionality Clarify that the order in res is either unstable or how it relates to split/axis. Not sure whether this is a bug or works as intended; the order of res is different depending on the split.
https://github.com/helmholtz-analytics/heat/blob/a1c3792e6e32b3bf7ca4953b6afbc1d53759a031/heat/core/manipulations.py#L1319
Additional context Example: