Open LarsOL opened 7 years ago
Hi!
I never had to test this, but my guess would be providing default values...
Le ven. 16 déc. 2016 02:34, Lars Lawoko notifications@github.com a écrit :
I am contemplating using LSH in my application, but I am unsure how to deal with absent/missing data in a vector. The nearest neighbor imputation implies that this type of algorithm deals with this scenario, but how would I go about implementing it?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tdebatty/java-LSH/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1SDJ8n5yhHkpojA044HXTz_gwhv3SCks5rIeqjgaJpZM4LOwp3 .
The main issue I see with providing a default value is that; wouldn't the values be artificially clustered around those "default" values that seem valid for the algorithm ? Random data may work, but then it is not deterministic.
Ideally what would happen is you can ignore a dimension if there is not a value in it.
I am contemplating using LSH in my application, but I am unsure how to deal with absent/missing data in a vector. The nearest neighbor imputation implies that this type of algorithm deals with this scenario, but how would I go about implementing it?