tdebatty / java-LSH

A Java implementation of Locality Sensitive Hashing (LSH)
Other
295 stars 84 forks source link

Dealing with missing data #12

Open LarsOL opened 7 years ago

LarsOL commented 7 years ago

I am contemplating using LSH in my application, but I am unsure how to deal with absent/missing data in a vector. The nearest neighbor imputation implies that this type of algorithm deals with this scenario, but how would I go about implementing it?

tdebatty commented 7 years ago

Hi!

I never had to test this, but my guess would be providing default values...

Le ven. 16 déc. 2016 02:34, Lars Lawoko notifications@github.com a écrit :

I am contemplating using LSH in my application, but I am unsure how to deal with absent/missing data in a vector. The nearest neighbor imputation implies that this type of algorithm deals with this scenario, but how would I go about implementing it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tdebatty/java-LSH/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1SDJ8n5yhHkpojA044HXTz_gwhv3SCks5rIeqjgaJpZM4LOwp3 .

LarsOL commented 7 years ago

The main issue I see with providing a default value is that; wouldn't the values be artificially clustered around those "default" values that seem valid for the algorithm ? Random data may work, but then it is not deterministic.

Ideally what would happen is you can ignore a dimension if there is not a value in it.