pytorch / botorch

Bayesian optimization in PyTorch
https://botorch.org/
MIT License
3.1k stars 402 forks source link

[Bug] Deduplicate not working for large n! #2201

Closed lluisp1999 closed 9 months ago

lluisp1999 commented 9 months ago

🐛 Bug

There is a bug in botorch.utils.multi_objective.pareto.is_non_dominated, where deduplicate is ignored whenever the size of the data is large enough.

To reproduce

Code snippet to reproduce

from botorch.utils.multi_objective.pareto import is_non_dominated
Y = torch.tensor([[1,2],[0,0],[2,1]])
NY = torch.tensor([[1,2],[0,0],[2,1]])
for i in range(3000):
    NY = torch.concat([NY,Y])
print(is_non_dominated(NY, deduplicate=False))

Stack trace/error message

Expected: tensor([ True, False,  True,  ...,  True, False,  True])
Got: tensor([ True, False,  True,  ..., False, False, False])
(i.e., deduplicate=False is ignored

Expected Behavior

When the data is large, the function is_non_dominated calls is_non_dominated_loop. The problem is that this function ignores deduplicate=False

sdaulton commented 9 months ago

Thanks for flagging this. We should alert the user that data will be deduplicated automatically for large n.

Do you have a use case where you'd like keep duplicate points for large n? We could potentially add that functionality if needed

lluisp1999 commented 9 months ago

The utility lays whenever we don't want to know the pareto front but rather seeing what points are not dominated. It has some great use-cases in many fields, particularly in GFlowNets in my case. I think it would be nice to have, or at least not to misslead the users. Thanks!