yahoojapan / NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
Apache License 2.0
1.24k stars 114 forks source link

Custom Metric #99

Open bhavaygg opened 3 years ago

bhavaygg commented 3 years ago

Hello,

I wanted to perform NN on a dataset of genomes. For this task, the distance between 2 datapoints is calculated by a custom script? Is there I can incorporate this without having to create the entire NN search algorithm myself and only modify some parts of your code?

masajiro commented 3 years ago

Hello, There is no function to use the metric that users defined on NGT. However, you can incorporate your metric into NGT like jaccard and poincare and lorentz.

bhavaygg commented 3 years ago

@masajiro Does the library accept strings as datapoints? Because my datapoints are essentially file names that are input to a script to calculate distance.

masajiro commented 3 years ago

NGT can handle uint8 objects, but cannot handle variable length objects. However, if the length of your strings is less than around 1000 characters, you might be able to store the strings into NGT as fixed length objects with zero padding. This is similar to your case.