Points associated to a cluster with a different label

raulsoutelo commented 6 years ago

Hello,

If I understood correctly, a point B is associated to a cluster A if they are closer than a fixed distance δ. Since the neural network is Lipschitz continuous, the output of the neural network cannot change much for a given δ. Therefore, if the error of the point A (contained in the coreset) is assumed to be zero, the error of B will be small (bounded).

However, if points A and B are closer than δ but they have different targets, the error could be arbitrarily large, right? Would it be sensible to only cover points that have the same target?

Thanks in advance!

raulsoutelo commented 6 years ago

Sorry, I have just realized that you have no labels in an active learning setting. Would you expect this approach to work better in a Data summarization scenario?

ozansener commented 6 years ago

@raulsoutelo Yes, there are no labels and our theory directly addresses that. There is a technical dilemma in the appendix which enables that by considering the distance between distributions.

About data summarization, the problem is called core-set and there are bunch of papers which shows these ideas indeed work for data summarization. We have some references in the related work section of our paper.

ozansener / active_learning_coreset

Points associated to a cluster with a different label #2